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CCNIFBRIN BBTA-GLUC0SIDA5B CDNA FOR MODIFYING LIGNIN 

CONTENT IN PLANTS 



Technical Field 

5 This invention relates to DNA molecules cloned 

from plants and methods of using such DNA molecules to 
produce transgenic plants with altered lignin content. 



BACKGROUND 

10 Lignin is the second most abundant organic 

material in the biosphere, and is a major component of 
cell walls of woody plnnts (such as poplar and pine 
species) and fodder crops (such as maize, wheat and 
barley) . The quantity of lignin in plant material 

15 affects characteristics that are agronomically 

important. For example, in fodder crops the amount or 
lignin present determines how easily the crop may be 
digested by animals; relatively small increases in 
lignin content may produce a large decreases in the 

20 digestibility of the crop. Therefore, reducing lignin 
content would enhance digestibility, facilitating a 
more efficient use of such crops. In the timber 
industry, producing wood pulp for papermaking requires 
the re»noval of lignin to. release the cellulosic content 

25 of the timber. The process of removing the lignin 
consumes large amounts of energy and produces 
environmentally harmful lignin waste liquors which must 
be treated prior to disposal. It has also been 
suggested that residual lignin in paper pulp may produce 
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toxic polychlorinated biphencls when the lignin 
interacts with chlorine used in the bleaching process. 
Thus, decreasing lignin content in wood products would 
be advantageous for papermaking. On the other hand, 
5 increasing the lignin content of titnber offers the 
possibility of increased wood strength. 

Accordingly, modification of quality and 
quantity of lignin in plants has been a long-standing 
interest among breeders and, more recently, among 
10 molecular biologists- Recent molecular approaches 

towards methods for reducing lignin content in plants 
are typified by: U.S. Patent No. 5,451,514, 
"Modification of Lignin Synthesis in Plants"; Canadian 
Patent No, 2,005,597, "Plants Having Reduced Lignin or 
15 Lignin of Altered Quality"; and International Patent 
Application Publication No. WO 94/23044. 

Lignin is a complex polymer of three cinnamyl 
alcohols, p-coumaryl, coniferyl and sinapyl, all 
products of phenylpropanoid metabolism. Depending on 
20 the plant species or tissue, the relative proportion of 
the different monomers in lignin can vary significantly. 
In gymnosperms for example, lignin is predominantly 
composed of coniferyl alcohol monomer units, whereas 
angiosperms have significant proportions of sinapyl 
25 moieties. The metabolism of lignin production involves 
many intermediates, enzymatic pathways and, 
correspondingly, genes. Accordingly, there are several 
gene/enzyme targets that might be selected to manipulate 
lignin production through genetic engineering . 
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Alteration of lignin levels by antisense and 
sense suppression of gene expression has already been 
attempted for several enzymes in the phenylpropanoid 
pathway including PAL (Elkind et al . 1990), CAD (Schuch 
5 1993; Canadian patent 2,005,597; U.S. Patent No. 

5,541,514), 4CL (Lee and Douglas 1994) and COMT (WO 
94/23044) . However, all of these attempts to modify 
lignin synthesis are directed at early stages in the 
synthetic pathway and are therefore likely to interfere 

10 with other metabolic processes which share these 

intermediate steps. It is clear, for example, that 
interference with early steps in the phenylpropanoid 
pathway can have undesirable pleiotropic effects (Elkind 
et al . , 1990). In addition, modulating biosynthetic 

15 enzymes that act early in the pathway may not be 

effective because alternative synthetic routes may be 
available. A better approach to modulating lignin 
synthesis would ce to regulate later stages in the 
lignin biosynthesis pathway: this would minimize or 

20 avoid pleiotropic effects and would likely provide a 
greater degree of effective control. 

It is an object of the present invention to 
identify and provide a plant nucleic acid sequence that 
encodes an enzyme that functions late in the pathway of 

25 lignin biosynthesis. It is a further object of this 
invention to provide vectors containing forms of this 
nucleic acid sequence suitable for introduction into 
plants to modify the production of lignin. 
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SUMMARY OF THE IKVKNTION 

In a first aspect, the present invention 
provides an isolated nucleic acid molecule comprising 
at least 15 consecutive nucleotides of the sequence shown 
in Seq. I.D. No. 6 and encoding a coniferin (J-glucosidase 
enzyme . 

In a further aspect, the present invention 
provides an isolated nucleic acid molecule which 
encodes a coniferin P-glucosidase enzyme and which 
hybridizes under condition of at least moderate 
stringency to the nucleotide sequence shewn in Seq. I.D. 
No- 6 . 

In a further aspect, the present Invention 
provides a coniferin P-glucosidase enzyme encoded by a 
nucleic acid molecule according to the invention. 

In a still further aspect, the present invention 
provides an isolated oligonucleotide which comprises 

at least 15 consecutive nucleotides of the se<r[:.er\ce shown 

in Seq. I.D. No. 6 or its complemencary strand. 

Also provided are recombinant vectors including a DNA sequence 
of the invention, and transgenic plants all as set forth in the 

accompanying clain set. 

In a further aspect, the present invention 
provides a method of producing a plant wich an 

altered lignin content relative to an untransf ormed plant 
of that species, comprising introducing into the plant a 
recombinant vector that ccmprises a promoter operably 
linked to a nucleic acid which hybridizes under 
conditions of moderate stringency to the sequence shown 
in Seq. I.D, No, S and which encodes a coniferin P- 
glucosidase enzyme. 

-4- 
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in a further aspect, the pr sent invention 
provides a method of producing a plant with an 
al-ered lignin content relative to an ransformed plant 
of that species, comprising introducir • ito the plant a 
recombinant vector that comprises a promoter operably 
linked to an antisense nucleic acid which, when expressed 
in cells of the plant, inhibits the expression of a 
native coniferin P-glucosidase gene. 

In yet a further aspect, the present invention 
provides a method of producing a plant with an 
altered lignin content relative to an untransf ormed plant 
of that species, comprising introducing into the plant a 
nucleic acid molecule comprisi*.^ a coding seqpaence 
operably linked to a promoter sequence, wherein the 
coding sequence encodes i\n untranslatable plus -sense 
transcript that shares at least 90% sequence similarity 
with a transcript of a native coniferin P-glucosidase 
gene . 

In a still further aspect, the present invention 
provides an isolated nucleic acid which encodes a 
coniferin P-glucosidase. 

In another aspect, the present invention 
provides a method of isolating a nucleotide sequence 
encoding a coniferin P-glucosidase enzyme, the method 
comprising hybridizing a nucleotide preparation with a 
DNA molecule comprising at least 15 consecutive 
nucleotides of the sequence set forth in Seq. I.D. No. 6. 
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BRIEF DESCRIPTION OP THE IHVEHTIOH 

The inventors have determined that the gene 
encoding coniferin p-glucosidase would be an excellent 
target gene for modifying lignin content in plants, 
5 particularly in trees such as conifers. The coniferin 
P-glucosidase enzyme catalyzes the hydrolysis of the 4- 
0-glucoside of coniferyl alcohol, coniferin, which is 
one of the last steps in the biosynthesis of lignin. 
Thus, the level of coniferin P-glucosidase activity 
10 directly affects lignin synthesis and, therefore, the 
quantity of lignin in the plant tissue. Coniferin 
accumulates in conifer xylem during cambium 
reactivation, consistent with a role as the dominant 
lignin precursor in these species (Freudenberg and 
15 Harkin 1963, Savidge 1989). P-glucosidases capable of 
hydrolyzing coniferin have been detected in suspension 
) culture systems (Hosel ec al . 1982, Hosel and Tcdenhagen 

1980) and seedlings (Marcinowski and Grisebach 1978) , 
and a coniferin P-glucosidase has been purified from 
20 differentiating xylem in trees (Dharmawardhana et al., 
I 1995). However, to date, the genetic manipulation of 

coniferin P-glucosidase has not been possible because 
the gene encoding rhe enzyme has not been cloned. 

To that end, the inventors have cloned and 
25 sequenced a complementary DNA (cCNA) sequence from the 
conifer tree species Pinus concorta . The provision of 
this cDNA sequence enables, for the first time, the 
regulation of coniferin p-glucosidase activity in plants 
through genetic engineering. Specifically, the 
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invention provides genetic constructs, such as plant 
transformation vectors , that inclu'le various forms of 
the coniferin P-glucosidase cDNA or sequences that are 
homologous to this cDNA. Depending on the specific 
5 nature of these constructs, they may be introduced into 
plants in order to increase or reduce the production of 
the coniferin P-glucosidase enz^^ir.e, and therefore to 
regulate lignin synthesis. 

Transformation vectors according to this 

10 invention preferably include a recombinant DMA sequence 
that comprises all or part of the coniferin P- 
glucosidase cDNA. Depending on the nature of the 
promoter sequence selected, such constructs may be used 
to modify lignin content throughout the plant or in a 

15 tissue-specific manner and either const itutively or at 
certain stages of plant development. The availability 
of inducible plant promoters also offers the possibility 
of changing lignin biosynthesis in a plant at desired 
times by application of the chemical or physical agent 

20 that induces transcription from the promoter. 

In one embodiment, transformation vectors may 
be constructed to over-express the co.aiferin P- 
glucosidase enzyme {"^sense" orientation) . Enhanced 
lignin synthesis may be achieved by introducing such 

25 vectors into plants. Examples of the application of 
this approach to modify plant phenotypes include U.S. 
Patent No. 5,268,526, "Overexpression of Phytochrome in 
Transgenic Plants", U.S. Patent No. 4,795,855, 
''Transformation and Foreign Gene Expression in Woody 
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Species'', and U.S. Patent No. 5,443,974 (over-expression 
of stearoyl-ACP desaturase gene) . 

Alternatively, such over-expression vectors 
may be used to suppress coniferin p-glucosidase enzyme 
5 activity through sense-suppression, as described in U.S. 
Patent Nos. 5,034,323 and 5,283,194, both entitled 
'^Genetic Engineering of Novel Plant Phenotypes" . 

In another embodiment, constructs may be 
designed to express plus-sense untranslatable coniferin 

10 P-glucosidase RNA, using methodologies described in U.S. 
Patent No. 5,583,021, **Production of Virus Resistant 
Plants" . Constructs of this type may be used to reduce 
the expression of the native coniferin p-glucosidase 
gene, thereby reducing coniferin P-glucosidase enzyme 

15 activity and, as a result, lignin content. 

In other embodiments, the present invention 
provides genetic constructs designed to express 
antisense versions of the coniferin P-glucosidase RNA. 
"Antisense" RNA is an RNA sequence that is the reverse 

20 complement of the mRNA encoded by a target gene. 
Examples of the use of antisense RNA to inhibit 
expression of target plant genes include U.S. Patent No. 
5,451,514, ''Modification of Lignin Synthesis in Plants" 
(use of antisense RNA to regulate CAD), U.S. Patent No. 

25 5,356,799, '^Antisense Gene Systems of Pollination 

Control for Hybrid Seed Production", U.S. Patent No. 
5,530,192 (use of antisense RNA to alter amino acid and 
fatty acid composition in plants) . 

In conjunction with these genetic constructs, 
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the present invention also includes methods for altering 
lignin biosynthesis in plants. Generally, such methods 
comprise introducing into the genome of a plant a 
genetic construct that includes all or part of the 
5 coniferin p-glucosidase cDNA (either in sense or 

antisense orientation) or a sequence derived from this 
cDNA, Methods for introducing transformation vectors 
into plants are well known in the art and include 
electroporation of plant protoplasts, liposome-mediated 

10 transformation, polyethylene mediated transformation; 
transformation using viruses, micro- inject ion of plant 
cells, micro-projectile bombardment of plant cells, 
vacuum infiltration, and Agrobacterium tumeficiens (AT) 
mediated transformation. Methods particularly suited to 

15 the transformation of woody species are described in 
Ellis et al. (1993), Ellis et al . (1996), U,S. Patent 
No. 5,122,466, "Ballistic Transformation of Conifer" and 
U.S. Patent No. 4,795,855, "Transf crnat ion and Foreign 
Gene Expression with Woody Species" . 

20 The invention also includes transformed plants 

having altered lignin compositions as a result of being 
transformed with a genetic construct as described above. 
Examples of plants that may be transformed in this 
manner include conifers, such as plants Eron the genera 

25 Picea, Pseudotsuga, Tsuga, Sequoia, Abies, Thuja, 

Libocedrus , Chamaecyparis and Laryx. Pines are expected 
to be a particularly suitable choice for genetic 
modification by the methods disclosed herein, including 
loblolly pine (Pinus taeda) , slash pine (Pinus 
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elJiotii) , longleaf pine (Finns palustris) , shortleaf 
pine {Pinna echinata) , jack pine {Pinus banks i ana) , 
ponderosa pine (Pinus ponderosa) , red pine {Pinus 
resinosa) , Eastern white pine {Pinus strobus) , Western 
5 white pine {Pinus monticola) , sugar pine (Pinus 

lambertiana) , lodgepole pine {Pinus contorta) , Monterey 
pine {Pinus radiata) , Afghan pine {Pinus eldarica) , 
Scots pine {Pinus sylvestris) and Virginia pine (Pinus 
virginiana) , Other tree species, including poplar, 

10 eucalyptus and aspen may also be transformed using the 
nucleotide sequences of this invention. However, the 
invention is not limited to trees: crop and forage 
plants such as maize, tobacco, alfalfa, wheat and 
grasses may also be transformed using the constructs 

IS provided by this invention in order to modify lignin 

content. In general, this invention can be applied to 
any plant species that can be transformed . 

Throughout this specification and claims, unless the 
context requires otherwise, the word "comprise", or 
variations such as ''comprises" or ''comprising", is to be 
understood to imply the inclusion of a stated integer or 
group of integers but not the exclusion of any other 
integer or group of integers. 

BRIEF DESCRIPTION OF THE DRAVflNGS 

Fig. 1 shows the nuclic acid sequence of the 
25 coniferin p-clucosidase cDNA and the amino acid secjuence 
of the encoded protein. 

Fig. 2 is a dendrogram illustrating the amino 
acid sequence comparisons between plant, two bacterial 
and one human family 1 glycosyl hydrolases and a family 
3 glycosyl hydrolase from Agrobacterium tumefaciens , 
The dendrogram was constructed using GeneWorks CLUSTAL V 
program. Database accession numbers are in parentheses. 

1, A, tumefacisns coniferin P-G {a42292) ; 2, Brassica 
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napus thio P-G (q00326) ; 3, Sinapis alba thio p-G 
(P29092); 4. B. napus thio P-G (s56656); 5, B.napu9 thio 
P-G (839549); 6, Arahidopsis Chaliana thio p-G (p37702) ; 
7, Pinus contorta coniferir. P-G; 8, Pnmus eerotina 
cyanogenic P-G (u50201) ; 9, PruTius serotina cyanogenic 
P-G {U26025); 10, Trifolium repens cyanogenic P-G 
(P26205); 11, T. repens p-G (26204); 12, Costus speciosus 
furostanol 26-0-P-G (d83177) ; 13. Manihot esculenta 
cyanogenic P-G (s23940); 14, Oryza sativa cyanogenic P-G 
(U28047); 15, Hordeum vulgare cyanogenic P-G (a575i2) ; 
16, Avena sativa P-G (s50756) ; 17, Sorghum bicolor 
cyanogenic P-G (u338i:); la, Zea mays P-G (j,49235) ; 19, 
Brassica nigra P-G (u72154); 20. A.thaliana p-G 
{U72153); 21, B . napus P-G (s5277l) ; 22, Agrchacteriwu 
faecalis cellobiase {g67489) ; 23, Bacillus circulans 
cellobiase (q03506) ; 24, Hcwo sapiens lactase-phlorizin 
hydrolase domain IV (p09848) . 

Fig. 3 shows the alignment of the CBG amino 
acid =,equence (GBAA) with the following amino acid 
sequences: Hordeu;n P-glucosidaae (L41869); Prunus 
amygladin hydrolase (U26025) ; Prunus P-glucosidase 
(X56733;. Trifoiiun, cyanogenic p-glucosidase (X56733) ; 
Trifolium non cyanogenic P-glucosidase (P26204); Manihot 
P-glucosidase (X94986) ; Manihot linemarase (S35175) ; 
sorghum dhurrinase (U33817) ; Zea p-glucosidase (A48860) ; 
Avena p-glucosidase (X78433) ; Arabidopsis 
thioglucosidase (X89413); Brassica p- 

glucosidase(S5271l); Brassica thioglucosidase (Q00326) ; 
Arabidopsis P-glucosidase (L11454); Human LPH subunit ' 
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III (LPH3HU) ; Bacillus p-glucosidase (A48969) ; Bacillus 
P-glucosidase (Q08638) ; Streptomyces p-glucosidase 
(S45675) , represents perfectly conserved amino 

acids, represents well conserved amino acids. 

5 Fig. 4 shows a transformation vector suitable 

for introducing antisense CBG into plants. 



DETAILED DESCRIPTION OP THE INVENTION 

Definitions and Abbreviations 

10 In order to facilitate review of the various 

embodiir.ents of the invention, the following definitions 
of terms and explanations of abbreviations are provided: 

4-NPG: 4 -nitrophenyl P-glucoside 

15 2-NPG: 2-nitrophenyl P-glucoside 

MUG : 4 -methylumbellif eryl P-glucoside 
VRA-G: 5, 4- ( P- D-glucopyranosyloxy ) -3- 
methoxyphenylmethylene -2- thioxothia2olidin-4 -one- 3 - 
ethanoic acid. VRA-G is a substrate analog of coniferin 

20 synthesized by Biosynth International Inc., Skoke, 
Illinois . 

EDC: l-ethyl-3- (dimethylaminopropyl ) carbodiimide 
PAL : phenylalanine ammonia -lyase 
CAD : Cinnamyl alcohol dehydrogenase 
25 4CL: 4-coumarate: CoA ligase 

COMT: caffeic acid 3 -o-methyltransf erase 
PAGE: polyacrylamide gel electrophoresis 
CBG: coniferin P glucosidase 
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Isolated: An '^isolated" nucleic acid has been 
substantially separated or purified away from other 
nucleic acid sequences in the cell of the organism in 
which the nucleic acid naturally occurs, i.e., other 
5 chromosomal and extrachromosomal DNA and RNA. The term 
'^isolated'' thus encompasses nucleic acids purified by 
standard nucleic acid purification methods. The term 
also embraces nucleic acids prepared by recombinant 
expression in a host cell as well as chemically 
10 synthesized nucleic acids. 

cDKA (complementary DNA) : a piece of DNA 
lacking internal, non-coding segments (introns) and 
regulatory sequences which determire transcription. 
cDNA is synthesized in the laboratory by reverse 
15 transcription from m*»ssenger RNA extracted from cells. 

OR? (open reading frame) : a series of 
I nucleotide triplets (codons) ceding for amino acids 

without any termination codons. These sequences are 
usually translatable into a peptide. 
20 Ortholog: two nucleotide sequences are 

^ orthologs of each other if they share a common ancestral 

sequence and diverged when a species carrying that 
ancestral sequence split into two species. Ortholgous 
sequences are also homologous sequences. 
25 Probes and primera: Nucleic acid probes and 

primers may readily be prepared based on the nucleic 
acids provided by this invention. A probe comprises an 
isolated nucleic acid attached to a detectable label or 
reporter molecule. Typical labels include radioactive 
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isotopes, ligands, chemiluminescent agents, and enzymes. 
Methods for labeling and guidance in the choice of 
labels appropriate for various purposes are discussed, 
e.g., in Sambrook et al. (1989) and Ausubel et al. 
5 (1987) . 

Primers are short nucleic acids, preferably 
DNA oligonucleotides 15 nucleotides or more in length. 
Primers may be annealed to a complementary target DNA 
strand by nucleic acid hybridization to form a hybrid 

10 between the primer and the target DNA strand, and then 

extended along the target DNA strand by a DNA polymerase 
enzyme. Primer pairs can be used for amplification of a 
nucleic acid sequence, e.g., by the polymerase chain 
reaction (PGR) or other nucleic-acid amplification 

15 methods known in the art. 

Methods for preparing and u.^ing probes and 
primers are described, for example, in Sambrook et al . 
(1989), Ausubel et al . (1987), and Innis et al . , (1990). 
PCR primer pairs can be derived from a known sequence, 

20 for example, by using computer programs intended for 
that purpose such as Primer (Version 0.5, * 1991, 
v;hitehead Institute for Biomedical Research, Cambridge, 
MA) . 

Purified: the term purified does not require 
25 absolute purity; rather, it is intended as a relative 
cerm. Thus, for example, a purified coniferin p- 
glucosidase protein preparation is one in which the 
coniferin P-glucosidare protein is more pure than the 
protein in its natural environment within a cell. 
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Preferably, a preparation of a coniferin P-glucosidase 
protein is purified such that the subject protein 
represents at least 50V of the total protein content of 
the preparation. 
5 Oparably linked: A first nucleic acid 

sequence is operably linked with a second nucleic acid 
sequence when the first nucleic acid sequence is placed 
in a functional relationship with the second nucleic 
acid sequence. For instance, a promoter is operably 

10 linked to a coding sequence if the promoter affects the 
transcription or expression of the coding sequence. 
Generally, operably linked DNA sequences are contiguous 
and, where necessary to join two protein coding regions, 
in the same reading frame. 

15 Recumbinant : A recombinant nucleic acid is 

one that has a sequence that is not naturally occurring 
or has a sequence that is made by an artificial 
combination of two otherwise separated segments of 
sequence. This artificial combination is often 

20 accomplished by chemical synthesis or, more commonly, by 
the artifir al manipulation of isolated segments of 
nucleic acius, e.g., by genetic engineering techniques. 

Transgenic plant: as used herein, this term 
refers to a plant that contains recombinant genetic 

25 material not normally found in plants of this type and 

which has been introduced into the plant in question (or 
into progenitors of the plant) by human manipulation. 
Thus, a plant that is grown from a plant cell into which 
recombinant DNA is introduced by transformation is a 
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transgenic plant, as are all offspring of that plant 
which contain the introduced DNA (whether produced 
sexually or asexually) . 

Coniferln P-glucosldase : The defining 
5 functional characteristic of the coniferin P-glucosidase 
enzyme is its ability to hydrolyze coniferin to release 
coniferyl alcohol. This activity can be measured using 
the glucosidase assay described herein. This invention 
provides a cDNA encoding the coniferin P-glucosidase 

10 enzyme from Pinus contorta. However the invention is 

not limited to this particular coniferin P-glucosidase: 
other nucleotide sequences which encode coniferin P* 
glucosidase enzymes are also part of the invention, 
including variants on the disclosed cDNA sequence and 

15 orthologous sequences from other plant species, the 

cloning of which is now enabled. Such sequences share 
the functional characteristic of encoding an enzyme that 
is capable of hydrolyzing coniferin- 

Additional definitions of terms commonly used 

20 in molecular genetics can be found in Benjamin Lewin, 

Genes V published by Oxford University Press, 1994 (ISBN 
0-19-854287-9); Kendrew et al (ersj. The Encyclopedia 
of Molecular Biology, published by Blackwell Science 
Ltd., 1994 (ISBN 0-632-02182-9); and Robert A. Meyers 

25 (ed.), Molecular Biology and Biotechnology: a 
Comprehensive Desk Reference, published by VCH 
Publishers, Inc., 1995 (ISBN 1-S6081-569-8) . 
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Saqu nc Listing 



^ Seq. 


I.D. 


No. 


1 is primer N7A 


Seq. 


I .D. 


No, 


2 is primer N7B 


Seq. 


I.D. 


No. 


3 is primer NIO 


5 Seq. 


I.D. 


No. 


4 is pximer CBG172 


Seq. 


I.D. 


No. 


5 is primer CBG75 


Seq. 


I.D. 


No. 


6 is the CBG cDNA and CBG 


peptide 








Seq. 


I.D. 


No. 


7 is primer NTl 


10 Seq. 


I.D. 


No. 


8 is primer CTl 


Seq. 


I.D. 


NOQ 


. 9-12 are primers useful for 



amplification of the CBG cDNA sequence (see Example 4 
below) . 



In the Sequence Listing, standard 
15 abbreviations are used for nucleotide bases, i.e., A » 
Adenine, G » Guanine, C » Cytosine, T « Thymine, I » 
Inostne, M - A or C, R - A or G, W « A or T, S » C or G, 
Y - C or T and K = G or T. 

Detailed aspects of the invention are provided 
20 in the following examples. 

KXAMPLB 1 

Identification of the Conifarin P-GlucoBidase cDNA 

25 Actively differentiating Pinus contorta xylem 

was harvested as described by Dharmwardhana et al . 
(1995) and used to isolate total RNA as described by 
Lewinsohn et al. (1994) . PolyA RNA isolated with an 
Oligotex mRNA isolation kit (Qiagen) was used to 

30 construct a cDNA library in the IZAP-XR vector. 
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employing Stratagene cDNA synthesis and Gigapakll Gold 
packaging kits. 

Coniferin P-glucosidase enzyme was purified 
from Pinua contorta xylem tissue as described by 
5 Dharmwardhana et al, (1995). In order to determine the 
N-terminal amino acid sequence of the purified enzyme, 
it was run on native PAGE gels, stained for activity on 
the synthetic coniferin substrate VRA-G and the staining 
band excised and subjected to SDS-PAGE. The protein was 

10 then transferred to an Immobilon membrane for N-terminal 
amino acid sequencing using an Applied Biosystems 470A 
gas phase sequencer (Edman degradation) . 

Gene-specific primers for PGR amplification of 
CBG sequence fragments were then designed based on the 

15 15 N-terminal amino acid sequence obtained. Primers N7A 
and N7B were based on the first 7 N-terminal amino acid 
residues and were identical except at the third base 
from the 3' end where the degeneracy is split between 
the primers. 

20 N7A: 5» GCTCTAGAGCGAC (T) A ( C) GI AAC (T) AAC (T) TTTCC 3 • (Seq, 
I.D. No. 1) 

N7B: 5' GCTCTAGAGCGAC(T) A(C)GIAAC(T) AAC{T) TTCCC 3 ' (Seq. 
I.D. No. 2) 

The amplification template used was the XZAP-cDNA 
25 library described above. The initial PGR reactions 
contained 200-300 ng A-ZAP-cDNA as template, 200 nM 
degenerate gene-specific primer N7A or N7B, 50 nM vector 
primer M13F or T7 (BRL) , 200/iM dNTP, and IX reaction 
buffer (10.T\M TrisHCl pH 8.3, 1 , 5mM MgClj, 50mM KCl) in a 
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50/il volume. Prior to adding 3 units I'aq polymerage 
(Boehringer) , the reaction mixture was heated to 94*^0 for 
2 min. The thermal cycling regime was as follows: 1-2 
cycles (94*'C/lmin. , 48**-52''C/2min. , 72V2min.); 30 cycles 
5 {94<»C/45sec. , 55Vlmin,, 72<'C/2min.) ; 72°C/I0min, 
extension. 

Amplification using primer N7B yielded 3-4 
major bands, whereas amplification with N7B did not 
yield consistent product, suggesting a mismatch at the 

10 degenerate third base. To increase specificity and 
identify the desired amplification product, a 20ng 
aliquot of reaction products from the initial PGR using 
N7B was reamplified using the partially nested gene- 
specific primer NIO [GAC(T) A(C)GIAAC(T) AAC(T) 

15 TTCCCIT(A)C(G> IGA{T>TT, Seq. I.D. No. 3] and vector 
primer T7 (30 cycles of j4°C/45sec., 55Vltnin., 
72**C/2min. followed by 72°C/I0min. final extension), 
yielding a 1 . 7kb band. 

Following identification of the 1.7kb band as 

20 the desired amplification product, the initial PGR 
reaction was repeated with less {0.9mM) MgCl, in the 
reaction buffer. The resulting 1 . 7kb band was then 
isolated by gel purification (Qiagen) and cloned into 
EcoRV-digested T-tailed Bluescript II KS vector 

25 according to the T/A cloning protocol (Holton and 

Graham, 1991) . Plasmid minipreps from several clones 
were used for restriction analysis of insert and for 
primer-directed sequencing of both strands using ABI 
AmpliTaq dye termination cycle sequencing. 
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To amplify the 5* end of the CBG cDNA, XZAP- 
cDNA from the library was again used as a template, this 
time in conjunction with a T3 vector primer and the 
gene-specific primer CBG172 ( CACATATCTGTGATATTGGTCG , 
5 Seq. I.D. No. 4) based on the sequence of the 3* CBG 
amplification product. A second nested gene-specific 
primer CBG75 (CCATCTTCTCGGACTGCTC, Seq. I.D. No. 5) was 
used to reamplify the former reaction products to 
confirm the authenticity of the POl product. The 
10 cloning and sequencing of the 5' PGR product was 

conducted as described above. An exact sequence match 
in the overlapping regions of the 5' and and 3' end 
clones confirmed the authenticity of the 5* 
amplification product. 

15 

BXAMPLS 2 
Analysis of the CBG cDNA Sequence 

The complete CBG cDNA sequence, shown in Fig. 

20 1 and in Seq. I.D. No. G, is 1909 bp in length. 

Nucleotides 183-1721 of the 1909bp encode a 513 amino 
acid protein (Fig. 1). The 5' and 3 ' -untranslated 
regions of the full length sequence contain 162 and 187 
nucleotides, respectively. The 3 ' -untranslated region 

25 does not contain the conserved eukaryotic 

polyadenylation signal AAUAAA, as is the case for more 
than 50% of reported plant mRNA sequences (Wu et al . , 
1995). Instead, the CBG 3 ' -untranslated region contains 
AAUAAA-like sequences like most plant mRNAs (Joshi, 

30 1987) . 
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The 5*-UTR of the CBG cDNA carries a 9bp AC- 
rich element (AACCAACAA) that is also present in 
Arabidopais PALI and bean chalcone synthase (CHS15) 
genes, and has been proposed to be an elici tor- inducible 
5 hypersensitive site (Lawton et al . , 1990; Ohl et al., 
1990) . This indirectly associates CBG with other 
phenylpropanoid metabolic genes/regulation, and is 
consistent with the induction of CBG activity in 
jackpine cell cultures by fungal elicitation (Campbell & 

10 Ellis, 1991) . 

The deduced 513 amino acid protein has a 
molecular weight of 58.3 kD and a calculated isoelectric 
point of pH 4.9. The N-terminal amino acid sequence 
determined for the purified enzyme corresponds to amino 

15 acids 24 - 40 in the deduced sequence. Met35 in the 

deduced sequence was identified as Thr during N-terminal 
amino acid sequencing. This mismatch could result from 
a misidentif ication during amino acid sequencing, or 
could represent a polymorphism. The nascent protein 

20 contains an N-terminal signal peptide with features 

characteristic of eukaryotic secretory signal seq*uences 
for ER targeting. The ^weight matrix" method (von 
Heijne, 1986) predicts two possible cleavage sites for 
the signal peptide, one between residues Glyl7 and 

2z PhelS, and a second between Ala23 and Arg24 , Since the 
N-terminal amino acid sequence of the mature protein 
begins at Arg24, t.ie co- translat ional processing of the 
signal peptide appears to occur at the predicted second 
cleavage site. The protein contains two putative N- 
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asparagine glycosylat ion sites at Asn223 and Asn447, 
consistent with the detection of oligosaccharide 
sidechains in the purified enzyme (Dharmawardhana et 
al. , 1995) . 

5 Nucleotide and amino acid sequence homology 

searches and comparisons were carried out using BLAST 
(Altschul et al,, 1990) on Genbank, EMBL, PDB, SWISS- 
PROT and FIR databases. Further sequence analysis was 
performed using PC/GENE or GeneWorks ( IntelliGenetics 

10 Inc.) software. The derived amino acid sequence of CBG, 
when compared to other glycohydrolase sequenct*s in the 
databases, showed the strongest similarity to enzymes 
belonging to family 1 glycosyl hydrolases (Henrissa.., 
1991) . The (J-glucosidases showing the highest 

15 similarity (30-50% identity) to CBG were from plant 

species Prunus, Hordeum, Tri folium, Manihot, Sorghum, 
Avena, and Costus, The dendrogram in Fig 2 illustrates 
that among the nlant p-glucosidases , pine CBG is loosely 
clustered with cyanogenic P-glucosidases from several 

20 species (Fig 2: sequences 7 to 13). Fig. 3 shows an 

alignment of the CBG amino acid sequence with other of 
P-glucosidases from other species. 

CBG contains several sequence elements that 
are highly conserved among many family 1 P-glucosidases, 

25 Between residues 34 and 48 it carries the N-terminal 
signature sequence 

F,X, (FYVM) , (GSTA) (GSTA) ,X, (GSTA) , (GSTA) , 

(FYN) , X, E, X (GSTA) characteristic of family 1 glycosyl 

hydrolases (Henrissat, 1991). Two of the five cysteine 
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residues found in CBG (Cysl75 and Cys225) are also 
conserved in these homologous P-glucosidases, suggesting 
tnat they may be involved in forming important 
intramolecular disulfide bridges. 
5 Other conserved sequence elements include the 

sequence -ENG- at residues 408-410 within the C- terminal 
signature, and the sequence -NEP- at residues 190-192. 
These sequence motifs are thought to be important for 
enzyme activity, and this region may be involved in 

10 binding of the pyranose ring during catalysis. The NEP 
motif of both Bacillus endo-p-1-4 -glucanase and CBG is 
flanked by hydrophobic amino acids; next to the signal 
peptide, it ifi the most hydrophobic region of the CBG 
enzyme. The hydrolytic mechanism of the family 1 p- 

15 glucosidases is considered to be general acid catalysis 
(Sinnott, 1990) with Glu and Asp residues in conserved 
motifs serving as active site nucleophile and acid 
catalyst. Evidence from inhibitor and site-directed 
mutagenesis studies suggests the Glu408 within the 

20 conserved ENG motif is the active site nucleophile 
(Withers et al . , 1990; Trimbur et al, 1992). A 
conserved aspartate residue (Asp427) located 19 residues 
downstream from the ENG motif of CBG appears to be 
analogous to Asp374 of Agrobacterium P-glucosidase 

25 (cellobiase) . This carboxylate side-chain may play the 
role of acid-base catalyst during hydrolysis of the 
glycosidic lin)cage (Trimbur et al . , 1992). 

30 
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EXAMPLE 3 
Expression of CBO cDNA In B. coll 

To express CBG protein in E. coli, the full- 
5 length coding region for the mature protein (i.e. 

excluding the signal peptide) was amplified using the 
3 ' end clone (1A6) as the template with the N- terminal 
primer, NTl (5' TAGCTAGCAGGCTGGACAGGAACAACTTC 3', Seq. 
I.D. No. 7) containing a 5' Nhel site, and a C-terminal 

10 primer, CTl (5' CTCGAGACAAGCAGTCTAAATGCT 3\ Seq, I.D. 
No. 8) containing a Xhol site. The resulting l.Skb DNA 
fragment was ligated into Bluescript II KS by T/A 
cloning as described above. The structui^ of the 
junctions of this construct was confirmed Ly sequencing 

15 and it was then inserted as a Nhel/Xhol fra<jTnent into 
expression vector pET21a (Novagen) . Because, the Nhel 
site was used to introduce the cDNA into the pET vector, 
three non-CBG amino acids (Met, Ala, Ser) were added to 
the N- terminus of the expressed protein. To avoid the 

20 expression of the vector His-tag at the 3' end, the 

native stop codon of CBG was included. The expressed 
protein was thus identical in sequence to ♦-he mature CBG 
expressed in planta, except for the additional 
tripeptide at the N- terminus. Following transformation 

25 into E.coli strain DH5a and verification of the plasmid 
integrity by restriction digestion, it was introduced 
into the expression host BL21(DE3) . 

To express CBG, the bacteria were grown to log 
phase (A«oo-0.6-0.9) followed by an additional 2-3 h 

30 incubation at 29-370c in the presence or absence of 0.4- 
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ImM IPTG. The expressed CBG in the soluble protein 
fraction was purified by preparative Q-Sepharose 
chromatography followed by QMA-Memsep (Millipore) 
chromatography . 

5 As noted above, the functional characteristic 

of the CBG enzyme is its ability to hydrolyze coniferin. 
This activity can be measured using the simple P- 
glucosidase assay described by Dharwardhana et al. 
(1995), conducted as follows: enzyme preparations (10- 

10 50 /il) and glucoside substrate (coniferin) (2mM final 
concentration) in 0.2M MES, pH 5.5 buffer in a final 
volume of 150 m1 are incubated at 30^C for 30 min. The 
reaction is stopped by basification of the assay mixture 
with an equal volume of 0 . 5M CAPS buffer (Sigma Chemical 

15 Co., St, Louis, MO), pH 10.5 and the activity measured 
by determining the absorbance of the released aglycone. 
The activity of the enzyme can be measured not only , 
against coniferin, but also against related glucosides 
including 4-NPG, 2-NPG, MUG and the synthetic coniferin 

20 analog VRA-G. For quantitative calculations, the 

following analysis wavelengths and € values (mM**x cm'M 
were used: conifer^'l alcohol, 325nm, € » 7.0; sinapyl 
alcohol, 315nm, € - 11.2/ 2-nitrophenol, 420nm, € 
4.55, 4-nitrophenol, 400nm, € 5»19.3; 4-methyl 

25 umbelliferone, 360 nm, e - 18,25; VRA-G, 490nm, € - 
38.6; salicyl alcohol, 295nm, € « 3.3. 

Soluble proteins and insoluble proteins 
(inclusion bodies) prepared from induced and uninduced 
bacterial cells were assayed for coniferin hydrolysis 
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activity by the method described above. Only the 
soluble protein fraction of induced cells displayed this 
activity. The activity in this fraction could be 
increased up to 2- fold by increasing the IPTG 
5 concentration from 0.4 - 1.0 mM, and by reducing the 

growing temperature from 21^C to 29°C. Activity staining 
of nondenaturing PAGE gels using the chromogenic 
coniferin analogue VRA-G revealed a P-glucosldase-active 
protein band in induced cell extracts. This protein was 

10 purified by anion exchange chromatography using 

coniferin as the substrate for monitoring P-glucosidase 
activity. The purified enzyme often migrated as a 
doublet on nondenaturing gels. Both protein bands in 
the doublet showed p-glucosidase activity, as assayed by 

15 hydrolysis of VRA-G. This could be due to partial 
degradation, alternate forms of folding, or the 
synthesis of a truncated protein at the 5' end where CBG 
has a prokaryotic ribosome binding Shine-Dalgarno 
sequence (GAAGGAG) * The latter would result in the 

20 synthesis of a polypeptide that is truncated at the N- 
terminus, as opposed to the full-length polypeptide 
initiated by ribosome binding to the standard ribosome 
binding site in the vector. As shown in Table 1 below, 
the CBG expressed in E. coli and the enzyme purified 

25 from the pine xylem showed almost identical substrate 
specificities • 

30 
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Tabl 1. £>ubatrate specificity of coniferin p- 
glucosidase purified from pine xylem and E^coli- 
expressed CBG-cDNA. 100% activity represents 14pKat for 
native coniferin P-gluccsidase and 22pKat for the 
5 recombinant enzyme. 



10 



Relative activity 



Substrate Katlve CBO ^.coli CBQ 



coniferin 100 100 

15 syringin 51 65 

4-methyl umbellif eryl-P-glucoeide 18 20 

2-nitrophcnyl-p-gluc08ide 51 50 

4-nicrophenyl-P-glucoflide 30 35 



20 



EXAMPLE 4 

25 Preferred Method for ^Iaking the CBQ cDKA 

With the provision of the CBG cDNA sequence 
shown in Seq. I,D. No. S, the polymerase chain reaction 
(PGR) may now be utilized in a preferred method for 

30 producing the CBG cDNA. PGR amplification of the CBG 
cDNA sequence may be accomplished either by direct PCR 
from an appropriate cDNA library or by Reverse- 
Transcription PCR (RT-PCR) using RNA extracted from 
plant cells as a temriate. Methods and conditions for 

3 5 both direct PCR and RT-PCR are known in the art and are 
described in Innis et al, (1990). Suitable plant cDNA 
libraries for direct PCR include the Pinv3 contorta 
library as described above. Other plant cDNA libraries 
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may be used in order to amplify orthologous cDNAs of 
other species; for example, the Arabidopsis cDNA library 
described by Newman et al - (1994) may be used to amplify 
the Arabidopsis ortholog. 
5 The selection of PGR primers will be made 

according to the portions of the cDNA which are to be 
amplified. Primers may be chosen to amplify small 
segments of the cDNA or the entire cDNA molecule. 
Variations in amplification conditions may be required 

10 to accommodate primers of differing lengths; such 
considerations are well known in the art and are 
discussed in Innis et al . (1990), Sambrook et al . 
(1989) , and Ausubel et al (1992) , By way of example 
only, the entire CBG cDNA molecule as shown in Seq. I.D. 

15 No. 6 may be amplified using the following combination 
of primers: 

5' GGATTTGGACCTGAAAATATCAAT 3' (Seq. I.D. No. 9) 
5 ' CAATGTTCTTACCCTGCAGTTCCC 3* (Seq. I.D. No. 10) 
The open reading frame portion of the cDNA may be 

20 amplified using the following primer pair: 

5 ' ATGGAGGTGTCTGTGTTGATGTGGGTA 3 ' ( Seq , I.D. No . 11) 
5' AATGCTGCTGCTGCTTCTAATACTTCC 3' (Seq. I.D. No. 12) 
These primers are illustrative only; it will be 
appreciated by one skilled in the art that many 

25 different primers may be derived from the provided cDNA 
sequence in order to amplify particular regions of this 
cDNA. Suitable amplification conditions include those 
described above for the original isolation of the CBG 
cDNA. As is well known in the art, amplification 
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conditions may need be varied in order to amplify 
orthologous genes where the sequence identity is not 
100%; in such cases, the use of nested primers, as 
described above may be beneficial. Resequencing of PGR 
5 products obtained by these amplification procedures is 
recommended; this will facilitate confirmation of the 
CBG cDNA sequence and will also provide information on 
natural variation on this sequence in different 
ecotypes, cultivars and plant populations. 
10 Oligonucleotides which are derived from the 

CBG cDNA sequence and which are suitable for use as PGR 
primers to amplify the CBG cDNA are encompassed within 
the scope of the present invention. Preferably, such 
oligonucleotide primers will comprise a sequence of IS- 
IS 20 consecutive nucleotides of the CBG cDNA, To enhance 
amplification specificity, primers of 20-30 nucleotides 
or more in length may also be used. 



EXAMPLE S 

20 Use of the CBG cDNA to Produce 

Plants with Modified Llgnln Content 

Once a gene {or cDNA) encoding a protein 
involved in the determination of a particular plant 

25 characteristic has been isolated, standard techniques 

may be used to express the cDNA in transgenic plants in 
order to modify that particular plant characteristic. 
The basic approach is to clone the cDNA into a 
transformation vector, such that it is operably linked 

30 to control sequences (e.g., a promoter) which direct 
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expression of the cDNA in plant cells. The 
transformation vector is then introduced into plant 
cells by one of a number of techniques (e.g., 
electroporation) and progeny plants containing the 
5 introduced cDNA are selected. Preferably all or part of 
the transformation vector will stably integrate into the 
genome of the plant cell. That part of the 
transformation vector which integrates into the plant 
cell and which contains the introduced cDNA and 

10 associated sequences for controlling expression (the 
introduced "transgene" ) may be referred to as the 
recombinant expression cassette. 

Selection of progeny plants containing the 
introduced transgene may be made based upon the 

15 detection of an altered phenotype. Such a phenotype may 
result directly from the cDNA cloned into the 
transformation vector or may be manifested as enhanced 
resistance to a chemical agent (such as an antibiotic) 
as a result of the inclusion of a dominant selectable 

20 marker gene incorporated into the transformation vector. 
The choice of (a) control sequences and (b) 
how the cDNA (or selected portions of the cDNA) are 
arranged in the transformation vector relative to the 
control sequences determine, in part, how the plant 

25 characteristic affected by the introduced cDNA is 

modified. For example, the control sequences may be 
tissue specific, r jch that the cDNA is only expressed in 
particular tissues of the plant (e.g., vascular systems) 
and so the affected characteristic will be modified only 
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in those tissues. The cDNA sequence may be arranged 
relative to the control sequence such that the cDNA 
tr^inscript is expressed normally, or in an antisense 
orientation. Expression of an antisense RNA that is the 
5 reverse complement of the cloned cDNA will result in a 
reduction of the targeted gene product (the targeted 
gene product being the protein encoded by the plant gene 
from which the introduced cDNA was derived) . Over- 
expression of the introduced cDNA, resulting from a 

10 plus-sense orientation of the cDNA relative to the 

control sequences in the vector, may lead to an increase 
in the level of the gene product, or may result in a 
reduction in the level of the gene product due to co- 
suppression (also termed ^sense suppression") of that 

15 gene product. 

Successful examples of the modification of 
plant characteristics by transformation with cloned cDNA 
sequences are replete in the technical and scientific 
literature. Selected examples, which serve to 

20 illustrate the level knowledge in this field of 
technology Include ; 

U.S. Patent No, 5,451,514 to Boudet 
(modification of lignin synthesis using antisense RNA 
and co-suppression) ; 

25 U.S. Patent No. 5,443,974 to Hitz 

(modification of saturated and unsaturated fatty acid 
levels using antisense RNA and co-suppression) ; 

U.S. Patent No. 5,530,192 to Murase 
(modification of amino acid and fatty acid composition 
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using antisenae RNA) ; 

U.S. Patent No, 5,455,167 tc Voelker 
(modification of medium chain fatty acids) 

U.S. Patent No. 5,231,020 to Jorgensen 
5 (modification of flavonoids using co-suppression) ; and 

U.S. Patent No. 5,583,021 to Dougherty 
(modification of virus resistance by expression of plus- 
sense RNA) 

These examples include descriptions of 
10 transformation vector selection, transformation 

techniques and the construction of constructs designed 
to over-express the introduced cDNA, untranslatable RNA 
forms or antisense RNA. In light of the foregoing and 
the provision herein of the CBG cDNA, it is thus 
15 apparent that one of skill in the art will be able to 
introduce this cDNA, or derivative forms of the cDNA 
(e.g., antisense forms), into plants in order to produce 
plants having modified lignin content. Example 6 below 
provides an exemplary illustration of how an antisense 
20 form of the CBG cDNA may be introduced into conifers 
using ballistic transformation, in order to produce 
coniters having altered lignin content. 

a. Plant Types 

25 Lignins are found in all plant types, and thus 

DNA molecules according to the present invention (e.g., 
the CBG cDNA, homologs of the CBG cDNA and antisense 
forms) may be introduced into any plant type in order to 
modify the lignin composition of the plant. Thus, the 
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sequences of the present invention may be used to modify 
lignin composition in any higher plants including 
monocotyledonous plants such as lily, com, rice, wheat 
and barley as well as dicotyledonous plants, such as 
5 tomato, potato, soy bean, cotton, tobacco, sunflower, 
saf flower and brasicca. As noted above, the present 
invention is expected to be particularly useful in woody 
species such as species belonging to the genera Picea, 
Pseudotsuga, Tsuga, Sequoia, Abies, Thuja, Libocedru3, 
10 Chamaecyparis and Laryx. Pines are expected to be a 

particularly suitable choice for genetic modification by 
the methods disclosed herein, including lodgepole pine 
(Pinus con tor ta) , the species from which the CBG cDNA 
was cloned. 

15 

b. Vector Construction, Choice o£ Promoters 

A number of recombinant vectors suitable for 
stable transfection of plant cells or for the 
establishment of transgenic plants have been described 

20 including those described in Pouwels et al., (19C1) , 
Weissbach and Weissbach, ^1989), and Gelvin et al , , 
fl990) . Typically, plant transformation vectors include 
one or more cloned plant genes (or cDNAs) under the 
transcriptional control of 5* and 3' regulatory 

25 sequences and a dominant selectable marker. Such plant 
transformation vectors typically also contain a promoter 
regulatory region (e.g., a regulatory region controlling 
inducible or constitutive, environmentally-or 
developmentally-regulated, or cell- or tissue-specific 
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expression) , a transcription initiation start site, a 
ribosome binding site, an RNA processing signal, a 
transcription termination site, and/or a polyadenylation 
signal . 

5 Examples of constitutive plant promoters which 

may be useful for expressing the CBG cDNA include: the 
cauliflower mosaic virus (CaMV) 35S promoter, which 
confers constitutive, high-level expression in most 
plant tissues (see, e.g., Odel et al., 1985; Dekeyser et 

10 al., 1990; Terada and Shimamoto, 1990); the nopaline 
synthase promoter (An et al., 1988); and the octopine 
synthase promoter (Fromm et al., 1989). 

A variety of plant gene promoters that are 
regulated in response to environmental, hormonal, 

15 chemical, and/or developmental signals, also can be used 
for expression of the CBG cDNA in plant cells, including 
promoters regulated by: (a) heat (Callis et al., 1988); 

(b) light {e.g., the pea rbcS-3A promoter, Kuhlemeier et 
al., 1989, the maize rbcS promoter, Schaffner and Sheen, 

20 1991, and the chlorophyll a/b-binding protein promoter) ; 

(c) hormones, such as abscisic acid {Marcotte et al . , 
1989); (d) wounding (e.g., wuni, Siebertz et al . , 1989); 
and (e) chemicals such as methyl jasminate or salicylic 
acid. Iz may also be advantageous to employ tissue- 

25 specific promoters, such as those described by Roshal et 
al . , a987) , Schernthaner et al . , (1S8B) , and Bustos et 
al. , a989) . 

Plant transformation vectors may also include 
RNA processing signals, for example, introns, which may 
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be positioned upstream or dovmstream of the CBG cDNA 
sequence in the traoasgene. In addition, the expression 
vectors may also include additional regulatory sequences 
from the 3 ' -untranslated region of plant genes, e.g., a 
3* terminator region to increase mRNA stability of the 
mRNA, such as the PI -II terminator region of potato or 
the octopine or nopaline synthase 3* terminator regions. 
Finally, as noted above, plant transformation 
vectors may also include dominant selectable marker 
genes to allow for the ready selection of transformants , 
Such genes include those encoding antibiotic resistance 
genes (e.g., resistance to hygromycin, kanamycin, 
bleomycin, G418, streptomycin or spectinomycin) and 
herbicide resistance genes {e.g., phosphinothricin 
acetyltransf erase) . 

c. Arrangement of CBG cDNA In Vector 

As noted above, the particular arrangement of 
the CBG cDNA in the transformation vector will be 
20 selected according to the expression of the cDNA 
desired. 

Sen«e Sxpression 

Where enhanced lignin synthesis is desired, 
the CBG cDNA may be operably linked to a constitutive 
25 high-level promoter such as the CaMV 35S promoter. As 
noted below, modification of lignin synthesis may also 
be achieved by introducing into a plant a transformation 
vector containing a variant form of the CBG cDNA, for 
example a form which varies from the exact nucleotide 
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sequence of the CBG cDNA, but which encodes a protein 
that retains the functional characteristic of the CBG 
protein, i,e. coniferin hydrolysis activity. 
Sanaa Suppression 

Constiructs in which the CBG cDNA (or 
variants thereon) are over-expressed may also be used to 
obtain co- suppression of the endogenous CBG gene in the 
manner described in U.S. Patent No, 5,231,021 to 
Jorgensen, Such co-suppression (also termed sense 
suppression) does not reqpiire that the entire CBG cDNA 
be introduced into the plant cells, nor does it require 
that the introduced sequence be exactly identical to the 
CBG cDNA. However, as with antisense suppression, the 
suppressive efficiency will be enhanced as (1) the 
introduced sequence is lengthened and (2) the sequence 
similarity between the introduced sequence and the 
endogenous CBG geneis increased. Sense-suppression is 
believed to be modulated, in part, by the position on 
the plant genome into which the introduced sequence 
integrates . 

Antis«naa Expression 

In contrast, a reduction ol lignin synthesis 
may be obtained by introducing antisense constructs 
based on the CBG cDNA sequence into plants. For 
antisense suppression, the CBG cDNA is arranged in 
reverse orientation relative to the promoter sequence in 
the transformation vector. The introduced sequence need 
not be the full length CBG cDNA, and need not be exactly 
homologous to the CBG cDNA, Generally, however, where 
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the introduced sequence ie of shorter lengthy a higher 
degree of homology to the native CBG sequence will be 
needed for effective antisense suppression. Preferably, 
the introduced antisense sequence in the vector will be 
5 at least 30 nucleotides in length, and improved 

antisense suppression will typically be observed as the 
length of the antisense secjuence increases. Preferably, 
the length of the antisense secjuence in the vector will 
be greater than 100 nucleotideg. Transcription of an 

10 antisense construct as described results in the 
production of RNA molecules that are the reverse 
complement of mRNA molecules transcribed from the 
endogenous CBG gene in the plant cell. Although the 
exact mechanism by which antisense RNA molecules 

15 interfere with gene expression has not been elucidated, 
it is believed that antisense RNA molecules bind to the 
endogenous mRNA molecules and thereby inhibit 
translation of the endogenous mRNA, 

Suppression of endogenous CBG gene expression 

20 can also be achieved using ribozymes. Ribozymes are 
synthetic RNA molecules that possess highly specific 
endoribonuclease activity. The production and use of 
ribozymes are disclosed in U.S. Patent No. 4,987,071 to 
Cech and U.S. Patent No, 5,543,508 to Haselhoff. The 

25 inclusion of ribozyme sequences within antisense RNAs 
may be used to confer RNA cleaving activity on the 
antisense RNA, such that endogenous mRNA molecules that 
bind to the antisense RNA are cleaved, which in turn 
leads to an enhanced antisense inhibition of endogenous 
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gene expression. 

imtranslatable RNA 

Suppression of native gene expression may be 
achieved by transforming the plant with a sequence that 
5 is homologous to the target gene, but which is rendered 
untranslatable by a genetic modification such as the 
introduction of a premature stop codon. This approach 
is described in U,S, Patent No. 5,583,021. The 
introduced CBG sequence is preferably 50-100 nucleotides 

10 in length, although longer sequences, such as 100-250 
nucleotides are preferred. The introduced sequence is 
engineered to encode an untranslatable RNA; the 
introduction of a premature stop codon early on in the 
coding region is a preferred way of achieving this. The 

15 sequence need not be perfectly homologous to the target 
CBG sequence, but at least 80%, and preferably 85% 
sequence homology will likely be more effective than 
lower homologies, 

20 d. Transformation and Rageneration 

Techniques 

Transformation and regeneration of both 
monocotyledonous and dicotyledonous plant cells are now 

25 routine, and the selection of the most appropriate 
transformation and regeneration techniques will be 
determined by the practitioner. The choice of method 
will vary with the type of plant to be transformed; 
those skilled in the art will recognize the suitability 

30 of particular methods for given plant types. Suitable 
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methods may include, but are not limited to: 
electroporaticn of plant protoplasts; liposome-medlated 
transformation; polyethylene mediated transformation; 
transformation using viruses; micro- injection of plant 
5 cells; micro-projectile bombardment of plant cells; 

vacuum infiltration; and AgroJbacterlum tumeficiens (AT) 
mediated transformation. Typical procedures for 
transforming and regenerating plants are described in 
the patent documents listed at the beginning of this 

10 section. In addition, methods for transforming woody 
species are described in Ellis et al . (1993), Ellis et 
al. (1996), U.S. Patent No. 5,122,466, ^Ballistic 
Transformation of Conifer* and U.S. Patent No. 
4,795,855, "Transformation and Foreign Gene Expression 

15 with Woody Species" . 

e. Selection of Transformed Plants 

Following transformation and regeneration of 
plants with the transformation vector, transformed 

20 plants are preferably selected using a dominant 

selectable marker incorporated into the transformation 
vector. Typically, such a marker will confer antibiotic 
resistance on the seedlings of transformed plants, and 
selection of transf ormants can be accomplished by 

25 exposing the seedlings to appropriate concentrations of 
the antibiotic. 

After transformed plants are selected and 
grown to maturity, they can be assayed to determine 
whether coniferin P-glucosidase synthesis has been 
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altered aa a result of the introduced transgene. This 
can be done in several ways, including by extracting and 
quantifying the enzyme activity as described in Example 
6. In addition, lignif ication may be determined 
5 histochemically, and lignin content may be quantified, 
as described in Example 6. Also, antisense or sense 
suppression of the endogenous CBG gene may be detected 
by analyzing mRNA expression on Northern blots. 

10 SXAHPLB 6 

Introduction Of Antisensa CBO cOKA Sequence 
Into White Spruce {Picee Clauca) 

By way of example, the following methodology 

15 may be used to produce white spruce trees having an 
altered lignin content. The CBG cDNA is operably 
linked, but in reverse orientation, to the enhanced 
cauliflower mosaic virus (CaMV) 35S promoter in place of 
the BT gene in plasmid pTVBT41100 (Ellis et al,, 1993). 

20 (Many other plants tranaf ormat ion vectors have been 
described and would be suitable for introducing CBG- 
based constructs into plants. Vector pBACGGUS shown in 
Fig. 4 is one such alternative vector that may be used) . 
Somatic embryos of Pxcea glauca are differentiated from 

25 embryogenic white spruce callus line and cultured as 

described by Ellis et al . (1993). Plasmid DNA is adhered 
to 1-3mM gold particles (O-S/ig DNA / mg gold) by calcium 
chloride and spermidine precipitation. Gold particles 
containing the DNA are then loaded on to carrier sheets 

30 at a rate of 0.05mg/cm^ and these particles are then 

introduced into somatic embryos as described by Ellis et 
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al . (1991). Transformed embryos are selected using 
kanamycin. Regeneration of transgenic plants (via the 
production of embryogenic callus) is achieved using the 
culture conditions described by Ellis et al. (1993). 
5 In order to determine coniferin P-glucosidase 

activity in the transgenic plants, the enzyme is 
extracted as described in Example 1 above, and the 
activity is assayed using the P-glucosidase assay 
described in Example 3 above. Plants transformed with 

10 the same vector without the CBG cDNA insert should 

preferably be used as controls. In situ localization of 
the enzyme activity can be determined using VRA-G as 
described by Dharmawardhana et al, (1995). Lignin in 
the stem sections is detected histochemically by Basic 

15 Fuchsin- induced fluorescence and imaging on a confocal 

laser scanning microscope as described by Dharmawardhana 
et al. (1992). In order to determine the effect of 
introducing the antisense construct into the plant cn 
lignin content, standard methods are used to quantify 

20 lignin in the transformed plant (and control plants) . 
Standard methods of quantifying lignin include the 
thioglycolic acid procedure as described by Whitmore 
(1978) and the acetyl bromide procedure as described by 
Liyama and Wallis (1990) . 

25 

BXAMPLS 7 
Production of Saqudnne Variants 

30 As noted above, modification of lignin 

synthesis in plant cells can be achieved by transforming 



-39- 



DJB:4C« 5493-4il81.PX July 24. 1997 

plants with the CBG cDNA, antisense constructs based on 
the CBG cDNA or other variants on the CBG cDNA sequence. 
With the provision of the CBG cDNA sequence herein, the 
creation of variants on the CBG cDNA sequence by 
5 standard mutagenesis techniques is now enabled. 

Variant DNA molecules include those created by 
standard DNA mutagenesis techniques, for example, M13 
primer mutagenesis. Details of these techniques are 
provided in Sambrook et al . (1989), Ch. 15. By the use 

10 of such techniques, variants may be created which differ 
in minor ways from the CBG cDNA. DNA molecules and 
nucleotide sequences which are derivatives of those 
specifically disclosed herein and which differ from 
those disclosed by the deletion, addition or 

15 substitution of nucleotides while still encoding a 

protein which possesses the functional characteristic of 
the CBG protein (i.e., the ability to hydrolyze 
coniferin) are comprehended by this invention. DNA 
molecules and nucleotide sequences which are derived 

20 from the CBG cDNA include DNA sequences which hybridize 
under moderately stringent conditions to the DNA 
sequences disclosed, or fragments thereof. 

Hybridization conditions resulting in 
particular degrees of stringency will vary depending 

25 upon the nature of the hybridizat 'r n method of choice 
and the composition and length of che hybridizing DNA 
used. Generally, the temperature of hybridization and 
the ionic strength (especially the Na* concentration) of 
the hybridization buffer will determine the stringency 



-40- 



DJS:ace 5493 •48111 . PA Jul/ 34. 199^ 

of hybridization. Calculations regarding hybridization 
conditions required for attaining particular degrees of 
stringency are discussed by Sambrook et al . (1989), 
chapters 9 and 11. By way of illustration only, a 
5 hybridization experiment may be performed by 

hybridization of a DNA molecule (for example, a 
variation of the CBG cDNA sequence) to a target DNA 
molecule (for example, the native CBG cDNA sequence) 
which has been electrophoresed in an agarose gel and 

10 transferred to a nitrocellulose membrane by Southern 

blotting (Southern, 1975) , a technique well known in the 
art and described in (Sambrook et al., 1989). 
Hybridization with a target probe labeled with [^^P] -dCTP 
is generally carried out in a solution of high ionic 

15 strength such as 6xSSC at a temperature that is 20-25**C 
below the melting temperature, T^, described below. For 
such Southern hybridization experiments where the target 
DNA molecule on the Southern blot contains 10 ng of DNA 
or more, hybridization is typically carried out for 6-8 

20 hours using 1-2 ng/ml radiolabeled probe (of specific 
activity equal to 10' CPM/^g or greater) , Following 
hybridization, the nitrocellulose filter is washed to 
remove background hybridization. The washing conditions 
should be as stringent as possible to remove background 

25 hybridization but to retain a specific hybridization 
signal. The term represents the temperature above 
which, under the prevailing ionic conditions, the 
radiolabeled probe molecule will not hybridize to its 
target DNA molecule. The r„ of such a hybrid molecule 
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may be estimated from the following equation (Bolton and 
McCarthy, 1962) : 

- 81.5*^0 - 16,6(logiotNa^] ) + 0.41{*G-C) - 0.63(* 
5 formamide) - (600/1) 

Where I » the length of the hybrid in base pairs. 

This equation is valid for concentrations of Na* in the 

range of 0 . 01 m to 0 . 4 m, and it is less accurate for 

10 calculations of T, in solutions of higher [Na*l . The 
equation is also primarily valid for DNAs whose G+C 
content is in the range of 30% to 75%, and it applies to 
hybrids greater than 100 nucleotides in length (the 
behavior of oligonucleotide probes is described in 

15 detail in Ch. 11 of Sambrook et al., 1989). 

Thus, by way of example, for a 150 base pair 
DNA probe derived from the first 150 base pairs of the 
open reading frame of the CBG cDNA (with a hypothetical 
%GC a 45%) , a calculation of hybridization conditions 

20 required to give particular stringencies may be made as 
follows : 

For this example, it is assumed that the 
filter will be washed in 0.3x SSC solution following 
hybridization, thereby 
25 [Na*] » 0,045M 

%GC » 45% 

Formamide concentration » 0 

1 m 150 base pairs 

= 81.5 - 16 (logiotNa*] ) + (0.41 x 45) - 
30 150) 

and so « 74. 4*^0. 

The of double- stranded DNA decreases by 
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1-1.5*0 with every 1* decrease in homology (Bonner et 
al., 1973). Therefore, for this given example, washing 
the filter in 0.3x SSC at 59.4-64.4*C will produce a 
stringency of hybridization equivalent to 90%; that is, 
5 DNA molecules with more than 10% sequence variation 
relative to the target CBG cDNA will not hybridize. 
Alternatively, washing the hybridized filter in 0.3 xSSC 
at a temperature of 65.4-68.4**C will yield a 
hybridization stringency of 94%; that is, DNA molecules 

10 with more than 6% sequence variation relative to the 

target CBG cDNA molecule will not hybridize. The above 
example is given entirely by way of theoretical 
illustration. One skilled in the art will appreciate 
that other hybridization techniques may be utilized and 

15 that variations in experimental conditions will 

necessitate alternative calculations for stringency. 

As used herein, moderate stringency conditions 
are those under which DNA molecules with more than 25% 
sequence variation (also termed "mismatch") will not 

20 hybridize. As noted above, the invention encompasses 

DNA molecules which hybridize under moderately stringent 
conditions to the CBG cDNA sequence. More preferably, 
such DNA molecules will hybridize under stringent 
conditions, which are conditions under which DNA 

25 molecules with more than 15% mismatch will not 

hybridize. More preferably still, such DNA molecules 
will hybridize under highly stringent conditions, i.e., 
those under which DNA sequences with more than 10% 
mismatch will not hybridize. Finally, in the most 
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preferred embodiment, these DNA molecules will hybridize 
to the CBG cDNA under extremely stringent conditions, 
that is, conditions under which DNA sequences with more 
than 6% mismatch will not hybridize, 
5 The degeneracy of the genetic code further 

widens the scope of the present invention as it enables 
major variations in the nucleotide sequence of a DNA 
molecule while maintaining the amino acid sequence of 
the encoded protein. For example, the 23rd amino acid 

10 residue of the CBG protein is alanine. This is encoded 
in the CBG cDNA by the nucleotide codon triplet GCT, 
Because of the degeneracy of the genetic code, three 
other nucleotide codon tx'iplets- -GCA, GCC and GCG--also 
code for alanine. Thus, the nucleotide sequence of the 

15 CBG cDNA could be changed at this position to any of 
these three codons without affecting the amino acid 
composition of the encoded protein or the 
characteristics of the protein. The genetic code and 
variations in nucleotide codons for particular amino 

20 acids is presenced in Tables 2 and 3. Based upon the 
degeneracy of the genetic code, variant DNA molecules 
may be derived from the cDNA molecules disclosed herein 
using standard DNA mutagenesis techniques as described 
above, or by synthesis of DNA sequences. Thus, this 

25 invention also encompasses DNA sequences which encode 
the CBG protein but which vary from the CBG cDNA 
sequence by virtue of the degeneracy of the genetic 
code , 
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TABLE 2 
The Genetic Cod 



First 
Position 



Third 
Position 





(5' end) 




Second 


Position 




(3* end) 






T 


c 


A 


G 




10 




Phe 


Ser 


Tyr 


Cys 


T 






Phe 


Ser 


Tyr 


Cyp 


C 




T 
















Leu 


Ser 


Stop (och) 


Stop 


A 






L6U 


Ser 


StOD (atib) 


Tro 


G 


15 
















Leu 


Pro 


His 


Arg 


T 






Leu 


Pro 


His 


Arg 


C 




c 
















Leu 


Pro 


Gin 


Arg 


A 


20 




Leu 


Pro 


Gin 


Arg 


G 






He 


Thr 


Asn 


Ser 


T 






He 


Thr 


Asn 


Ser 


C 




A 












25 




He 


Thr 


Lys 


Arg 


A 






Met 


Thr 


Lys 


Arg 


G 






val 


Ala 


Asp 


Gly 


T 






Val 


Ala 


Asp 


Gly 


C 


30 


G 
















Val 


Ala 


Glu 


Gly 


A 






Val (Met) 


Ala 


Glu 


Gly 


G 



35 "Stop (och) " stands for the ocre termination triplet, and 
"Stop (amb) " for the amber. ATG is the most common 
initiator codon; GTG usually codes for valine, but it can 
also code for methionine to initiate an mRNA chain. 
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TABLE 3 

The Degeneracy of the Genetic Code 



Kumber of Total 
Synonymous Number of 

Codons Amino Acid Codona 



6 Leu, Ser, Arg 18 

4 Gly, Pro, Ala, Val, Thr 20 

3 He 3 

15 2 Phe, Tyr, Cys, His, Gin, 18 

Glu, Asn, Asp, Lys 

1 Met, Trp ^ 

Total number of codons for amino acids 61 

Number of codons for termination _1 

20 Total number of codons in genetic code 64 



One skilled in the art will recognize that DNA 
25 mutagenesis techniques may be used not only to produce 
variant DNA molecules, but will also facilitate the 
production of proteins which differ in certain 
structural aspects from the CBG protein, yet which are 
clearly derivative of the CBG protein and which maintain 
30 the essential characteristics of the CBG protein. Newly 
derived proteins may also be selected in order to obtain 
variations on the characteristic of the CBG protein, as 
will be more fully described below. Such derivatives 
include those with variations in amino acid sequence 
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including minor deletions, additions and substitutions. 

While the site for introducing an amino acid 
sequence variation is predetermined, the mutation per se 
need not be predetermined. For example, in order to 
5 optimize the performance of a mutation at a given site, 
random mutagenesis may be conducted at the target codon 
or region and the expressed protein variants screened 
for the optimal combination of desired activity. 
Techniques for making substitution mutations at 

10 predetermined sites in DNA having a knovm sequence as 
described above are well known. 

Amino acid substitutions are typically of single 
residues; insertions usually will be on the order of 
about from 1 to 10 amino acid residues; and deletions 

15 will range about from 1 to 30 residues. Substitutions, 
deletions, insertions or any combination thereof may be 
combined to arrive at a final construct. Obviously, the 
mutations that are made in the DNA encoding the protein 
must not place the sequence out of reading frame and 

20 preferably will not create complementary regions that 
could produce secondary mRNA structure . 

Substitutional variants are those in which at 
least one residue in the amino acid sequence has been 
removed and a different residue inserted in its place. 

25 Such substitutions generally are made in accordance with 
the following Table 4 when it is desired to finely 
modulate the characteristics of the protein. Table 4 
shows amino acids' which may be substituted for an 
original amino acid in a protein and which are regarded 
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y as conservative substitutions. 

TA3LS 4 



5 

Original Residue Conservative Substitutions 



10 


Ala 


ser 






T va 




Asn 


gin, his 




Asp 


glu 




Cys 


ser 


15 


Gin 


asn 




Glu 


asp 




Gly 


pro 




His 


asn; gin 




He 


l3U. val 


20 


Leu 


ile; val 




Lys 


arg; gin; glu 




Met 


leu; ile 




Phe 


met ; leu; tyr 




Ser 


thr 


25 


Thr 


ser 




Trp 


tyr 




Tyr 


trp; phe 




Val 


ile; leu 



30 



Substantial changes in enzymatic function or 
other features are made by selecting substitutions that 
are less conservative than those in Table 4, i.e., 

35 selecting residues that differ more significantly in 
their effect on maintaining (a) the structure of the 
polypeptide backbone in the area of the substit :ion, 
for example, as a sheet or helical conformation, (b) the 
charge or hydrophobicity of the molecule at the target 

40 site, or (c) the bulk of the side chain. The 
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substitutions which in general are expected to produce 
the greatest changes in protein properties will be those 
in which (a) a hydrophilic residue, e.g., seryl or 
threonyl, is substituted for (or by) a hydrophobic 
5 residue, e.g., leucyl, isoleucyl, phenylalanyl, valyl or 
alanyl; (b) a cysteine or proline is substituted for (or 
by) any other residue; (c) a residue having an 
electropositive side chain, e.g., lysyl, arginyl, or 
histadyl, is substituted for (or by) an electronegative 

10 residue, e.g., glutamyl or aspartyl; or (d) a residue 
having a bulky side chain, e.g., phenylalanine, is 
substituted for (or by) one not having a side chain, 
e.g., glycine. 

The effects of these amino acid substitutions or 

15 deletions or additions may be assessed for derivatives 
of the CBG protein by analyzing the ability of the 
derivative proteins to hydrolyze coniferin by the assay 
described herein. 
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(2) INFORMATION FOR SEQ in NO: 1: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 

(B) TYPE: nucleic acid 

(C) STRANDEDNHSS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

GCTCTAGAGC OAYMGIAAyA AYTTTCC 27 



(2) INFORMATION FOR SEQ ID NO: 2: 
(i) SBQCBNCB CHARACTERISTICS: 

(A) LENGTH: 27 

(B) TYPE: nucleic acid 
15 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

GCTCTAGAGC OAYMGIAAYA AYTTCCC 27 

20 (2) INFORMATION FOR SEQ ID NO: 3: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
25 (D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

GAYMGIAAYA AYTTCCCIWS IGWTT 25 

(2) INFORMATION FOR SEQ ID NO: 4: 
30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
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CACATATCTO TOATATTGOT CO 22 

(2) INFORMATIOK FOR SEQ ID NO: 5: 

(1) SBQUEKCE CmRACTERISTICS: 
5 U) LENGTH: 19 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

10 CCATCTTCTC GOACTGOTC 19 

(2) INFORMATION FOR SEQ ID NO: 6: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1909 
15 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6; 



20 ggatttggac ctgaaaatat caatttcaaa gcaattccag agggataacg SO 

tgggatcctt accattacca acaacccacc attccgccct gccgacctca 100 

ggcatatttt gattctattt aaccattaat tcatctgggc agttgcgatt 150 

ctgtataatt cgatcgctcc gttttagcag 180 

25 ac atg gag gtg tct gtg ttg atg tgg gta ctg etc ttc tat tec 224 
Met Glu Val Ser Val Leu Met Trp Val Leu Leu Phe Tyr Ser 
15 10 

tta tta ggt ttt caa gtg acg aca get agg ctg gac agg aac aac 269 

30 X<eu Leu Gly Phe Oln Val Thr Thr Ala Arg Leu Asp Arg Aan Ai?n 
15 20 25 

ttc ccc tea gat ttc atg ttc ggc aca gee tct tea gcg tat cag 314 

Phe Pro Ser Asp Phe Met Phe Gly Thr Ala Ser Ser Ala Tyr Gin 
35 30 35 40 

tat gaa gga gca gtc cga gaa gat ggc aag ggt cct age aca tgg 359 
Tyr Glu Gly Ala Val Arg Glu Asp Gly Lys Gly Pro Ser Thr Trp 
45 50 55 



40 



gac gcc tta aca cat atg cct ggt aga ata aaa gat age age aat 404 
Asp Ala Leu Thr His Met Pro Gly Arg lie Lys Asp Ser Ser Asn 
$0 65 70 
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gga gac gtg gca gtc gac caa tat cac aga tat atg gaa gat ate 449 

Gly Aflp Val Ala Val Asp Gin Tyr His Arg Tyr Met Glu Asp He 
75 eO 85 

5 gag ctt atg get tea ctt gga eta gat gee tat aga ttc tee ata 494 

Glu He Met Ala Ser Leu Gly Leu Asp Ala Tyr Arg Phe Ser He 
90 95 100 

tec tgg tet cga ate ctt cea gaa gga aga ggt gaa att aae atg 539 

10 Ser Trp Ser Arg lie Leu Pro Glu Gly Arg Gly Glu He Asn Met 
105 110 115 

get ggg att gaa tat tac aat aat ctg att gac get ctt ctg caa 584 

Ala Gly He Glu Tyr Tyr Asn Asn Leu He Asp Ala Leu Leu Gin 
15 120 125 130 

aat ggg ate cag ccg tte gtg aca ttg ttc cat ttc gat ctt cee 629 

Asn Gly He Gin Pro Phe Val Thr Leu Phe His Phe Asp Leu Pro 
135 140 145 

20 

aaa gea ctt gaa gac tec tat ggg gga tgg ctg agt ect caa ata 674 

Lys Ala Leu Glu Asp Ser Tyr Gly Gly Trp Leu Ser Pro Gin He 
150 155 160 

25 att aac gac ttc gaa gcc tat gca gag att tgc ttc egg gca ttc 119 

He Asn Asp Phe Glu Ala Tyr Ala Glu He Cys Phe Arg Ala Phe 
165 170 175 

ggt gac cgt gtc aaa tat tgg gcg aca gtg aac gag cea aat ctg 764 

30 Gly Asp Arg Val Lys Tyr Trp Ala Thr Val Asn Glu Pro Asn Leu 
190 185 190 

ttt gtg ccg ttg gga tac acc gtc gga ata ttt cea ccg aeg agg 809 

Phe Val Pro Leu Gly Tyr Thr Val Gly He Phe Pro Pro Thr Arg 
35 195 200 205 

tgt get gcc cct cac gcc aat cct ttg tgc atg aca ggg aat tgc 854 

Cys Ala Ala Pro His Ala Asn Pro Leu Cys Met Thr Gly Asn Cys 
210 215 220 

40 

teg tea gca gag cea tat eta get gca cat cac gtt ttg etc gcc 899 

Ser Ser Ala Glu Pro Tyr Leu Ala Ala His His Val Leu Leu Ala 
225 230 235 

4 5 cac gca tet gca gtg gag aaa tat agg gag aaa tat cag aaa att 944 

His Ala Ser Ala Val Glu Lys Tyr Arg Glu Lys Tyr Gin Lys He 
240 245 250 

caa gga gga tet ata ggg tta gtt ata age gcg cea tgg tac gaa 989 

50 Gin Gly Gly Ser He Gly Leu Val He Ser Ala Pro Trp Tyr Glu 
255 260 265 
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ccc ttg gaa aat tct cca gaa gag aga tea get gtt gat aga att 1034 
Pro L«u Olu Afln Ser Pro Glu Olu Arg Ser A3a Val Aap Arg lie 
270 275 290 

5 tta tec ttc aat etc cga tgg ttt ttg ga'c cca att gtt ttt gga 1079 

Leu Ser Phe Aan Leu Arg Trp Phe Leu Asp Pro He Val Phe aiy 
285 290 29S 

gat tat cca caa gaa atg cgt gaa aga tta gga teg cgc tta ccc 1124 

10 Aap Tyr Pro Oln Glu Met Arg Glu Arg Leu Gly Ser Arg Leu Pro 
300 305 310 

tec ata tec teg gaa eta tct gcg aaa ctt egg gga teg ttc gac 1169 

Ser He Ser Ser Glu Leu Ser Ala Lya Leu Arg Gly Ser Phe Aap 
15 315 320 325 

tat atg ggt att aat eac tat aca acc tta tat gca aea age act 1214 
Tyr Met Gly He Aan Hie Tyr Thr Thr Leu Tyr Ala Thr Scr Thr 
330 3"jk5 340 

20 

ect ccc ctt tec ccc gac cac acg caa tat eta tat cca gac tct 1259 
Pro Pro Leu Ser Pro Aap Hia Thr Oln Tyr Leu T^tt Pro Aap Ser 
345 350 355 

25 agg gtt tat ctg act gga gag cgc cac gga gtc tec ate gga gaa 1304 
Arg Val Tyr Lri Thr Gly Glu Arg Hia Gly Val Ser He Gly Glu 
360 365 370 

egg aca ggg atg gac ggt ttg ttt stg gta cct cat gga att caa 1349 
30 Arg Thr Gly Met Aap Gly Leu Phe Val Val Pro His Gly He Gin 
375 380 385 

aaa Ata gtg gag tat gta aaa gaa ttc tat gac aac ecg act att 1394 
Lye He Val Glu Tyr Val Lys Glu Phe Tyr Aap Aan Pro Thr He 
35 390 39$ 400 

att ate gca gag aac ggt tat cca gag tct gag gaa tec teg teg 1439 
He He Ala Glu Aan Gly Tyr Pro Glu Ser Glu Glu S*ir Ser Ser 
405 413 415 

40 

act ctg caa gaa aat eta aac gat gtg agg aga aca agg ttt eac 14 84 
Thr Leu Gin Glu Aan Leu Aan Aap Val Arg Arg He Arg Phe His 
420 425 430 

45 gga gat tgt ttg agt tat etc agt gca gca ate aaa aat ggc tea 1529 
Gly Aap Cya Leu Ser Tyr Leu Ser Ala Ala He Lya Aan Gly Ser 
435 440 445 

gat gtt cga ggg tac ttt gtg tgg tea ctt ctg gat aat ttt gag 1574 
50 Aap Val Arg Gly Tyr Phe Val Trp Ser Leu Leu Aap Aan Phe Glu 
450 455 460 
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tgg gca ttt ggg tat acc att aga ttt ggt ctt tat cac gtg gat 
Trp Ala Phe Oly Tyr Thr He Arg Phe Oly 3>u Tyr His Val Asp 
4€t 470 475 

5 ttc att tct gat caa aag aga tat ccc aag etc teg get eaa tgg 
Phe He Ser Asp Oln Lys Arg Tyr Phe Lys Leu Ser Ala Gin Trp 
480 4B5 490 

ttc aga eaa ttt ctt cag cac gac gat cag gga agt att aga age 
10 Phe Arg Gin Phe Leu Gin His Asp Asp Gin Gly Ser He Arg Ser 

495 500 505 

age age age att tag actgcgttgt ctatttgeta atcaaagcge 
Ser Ser Ser He 
15 510 

acacattcct gcaactctac ccaaaatcct gcaagcaaat atgttgtgtt 
cggatctate caccgtgaga cacattacaa agaaatcate aatctattcc 
aaaatgeaga aaaccccatt cagatgttct agggaactgc agggtaagaa 
20 cattg 



1619 



1664 



1709 



1754 



1804 
1854 
1904 
1909 



25 



30 



35 



(2) IKPORMATION FOR SEQ ID NO: 7: 

(i) SEQUKNCS CHAilACTBRISTICS: 

(A) LEKGTH: 29 

(B) TYTE: nucleic acid 

(C) STRANDBDNBSS: single 

(D) TOPOLOGY: linear 

(xi) SEQUKKCE DESCRIPTION: SEQ ID NO: 



TAGCTAGCAG GCTGCACAGG AACAACTTC 



29 



(2) INPORMATIOM FOR SEQ ID NO: 8; 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 24 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS : single 

(D) TOPOLOGY: linear 

(xl) SEQO"ENCE DESCRIPTION: SEQ ID NO : 8: 



CTCGAGACAA GCAGTCTAAA TGCT 



24 



40 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SrOUBNCE CHARACTERISTICS: 
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(A) LENGTH: 24 

(B) TYPE: nucleic acid 

(C) STRANDBDNBSS : single 

(D) TOPOLOGY; linear 

5 (xi) SEQUENCE DSSCRimON: SEQ ID NO: 9: 

GGATTTGGAC CTGAAAATAT CAAT * 24 

(2) INFORMATION FOR SEQ ID NO: 10: 

(1) SBQUBNCB CHARACTERISTICS: 
10 (A) LENGTH: 24 

(B) TYPE: nucleic acid 

(C) STRANDBDNBSS: single 

(D) TOPOLOGY: linear 

(x\) SEQUBNCB DESCRIPTION: SEQ ID NO: 10: 

15 CAATGTTCTT ACCCTGCAGT TCCC 24 

(2) IKFORMATION FOR SEQ ID NO: 11: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 

20 (B) TYPE: nucleic acid 

(C) STRANDBDNBSS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

ATGGAGGTGT CTGTGTTGAT GTGGGTA 27 

25 

(2) INFORMATION FOR SEQ ID NO: 12: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 
30 (B) TYPE: nucleic acid 

(C) STRANDBDNBSS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

AATGCTGCTO CTOCTTCTAA TACTTCC 27 
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We claim: 

1 . An isolated nucleic acid molecule comprising 
at least 15 consecutive nucleotides of the sequence shown 
in Seq. I.D. No. 6 and encoding a coniferin P-glucosidase 
enzyme . 

2. An isolated nucleic acid molecule* according 
to claim 1 wherein the molecule comprisfls at least 20 
consecutive nucleotides of the sequence shown in Seq. 
I.D. No, 6. 

3 . An isolated nucleic acid molecule according 
to claim 1 wherein the molecule comprises at least 30 
consecutive nucleotides of the sequence shown in Seq. 
I.D. No. 6. 

4. The isolated nucleic acid according to claim 
1 wherein the nucleic acid molecule comprises the 
nucleotide sequence shown in Seq. I.D. No. 6. 

5. An isolated nucleic acid molecule which 
encodes a coniferin p-glucosidase enzyme and which 
hybridizes under condition of at least moderate 
stringency to the nucleotide sequence shown in Seq. I.D. 
No. 6. 

6. A coniferin P-glucosidase enzyme encoded by a 



^284 
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nucleic acid molecule according to claims 1-5, 

7. A recombinant vector comprising a DMA sequence 
according to claims 1-5. 

5 

8. A transgenic plant transformed with a vector 
according to claim 7. 

9. A transgenic plant according to claim 8 

10 wherein the plant has an altered lignin content compared 
to an untransformed plant of the same species. 



10 . A transgenic plant according to claim 9 
wherein the lignin content is reduced compared to an 

15 untransformed plant of the same species. 

11. A transgenic plant according to claim 9 
wherein the plant is a conifer. 

20 12. A transgenic plant according to claim 9 

wherein the plant is a Pinus species. 

13. An isolated oligonucleotide which comprises 
at least 15 consecutive nucleotides of the sequence shown 

25 in Seq. I.D. No. 6 or its complementary strand. 

14. An oligonucleotide according to claim 13 
wherein the oligonucleotide comprises at least 30 
consecutive nucleotides of the sequence shown in Seq. 
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I.D. No. 6 or its complementary strand, 



15, An olig'^nucleotide according to claim 13 
wherein the oligonucleotide comprists at least 100 

5 consecutive nucleotides of the sequence shown in Seq, 
I.D. No. 6 or its complementary strand. 

16, A recombinant vector comprising a DMA 
sequence according to claims 13-15, 

10 

17, A transgenic plant transformed with a vrotor 
according to claim 16, 

18 - A transgenic plant according to claim 17 
15 wherein the plant has an altered lignin content compared 
to an untransformed plant of the same species, 

19. A transgenic plant according to claim 18 
wherein the lignin content is reduced compared to an 

20 untransformed plant of the same species. 

20. A transgenic plant according to claim 19 
wherein the plant is a conifer, 

25 21, A transgenic plant according to claim 19 

wherein the plant is a Pinus species, 

22, A method of producing a plant with an 
altered lignin content relative to an untransfortried plant 
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of that species, comprising introducing into the plant a 
recombinant vector that cooprisea a promoter operably 
linked to a nucleic acid which hybridizes under 
conditions of moderate stri/*gency to the sequence shown 
In Seq. I.D. No. 6 and which encodes a coniferin P- 
glucosidase enzyme. 

23. A transgenic plant produced according to the 
method of claim 22. 

24. A transgenic platnt comprising, integrated 
into its genomo, a promoter operably linked to a nucleic 
acid which hybridizes under conditions of moderate 
stringency to the sequence shown in Seq. I.D. No 6 and 
which encodes a coniferin P-glucosidase enzyme. 

25. A method of producing a plant with an 
altered lignin content relative to an untransf or-ned plsmt 
of that species, comprising introducing into the plant a 
recombinant vector that comprises a promoter operably 
linked to an antisense nucleic acid which, when e>'pressed 
in cells of the plant, inhibits the expression of a 
native coniferin P-glucosidase gene. 

26. A transgenic plant produced according to the 
method of claim 25. 

27. A transgenic plant comprising, integrated 
into its genome, a promoter operably linked to an 
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antisense nucleic acid which, when expressed in cells of 
the plant, inhibits the expression of a native coniferin 
p-glucosidase gene. 

5 28. A method of producing a plant with an 

altered lignin content relative to an untrans formed plant 
of that species, comprising introducing into the plant a 
nucleic acid molecule comprising a coding sequence 
operably linked to a promoter sequence, wherein the 
10 coding sequence encodes an untranslatable plus -sense 

transcript that shares at least 80% sequence similarity 
with a transcript of a native coniferin P-glucosidase 
gene. 

15 29. A transgenic plant produced according to the 

method of claim 28. 

30. A transgenic plant including, integrated 
into its genome, a nucleic acid molecule comprising a 

20 coding sequence operably linked to a promoter sequence, 
wherein the coding sequence encodes an untranslatable 
plus-sense transcript that shares at least 80* sequence 
similarity with a transcript of a native coniferin P- 
glucosidase gene. 

25 

31. An isolated nucleic acid which encodes a 
coniferin P-glucosidase . 

32. The isolated nucleic acid according to claim 
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31 wherein the encoded coniferin P-glucosidase has an 
amino acid sequence as shown in Seq. I.D. No, 6. 

33. A method of isolating a nucleotide sequence 
5. encoding a coniferin P-glucosidase enzyme, the method 
comprising hybridizing a nucleotide preparation with a 
DNA molecule comprising at least 15 consecutive 
nucleotides of the sequence set forth in Seq, I.D. No. 6. 

34. An isolated nucleic acid molecule jx 
^ defined in claim 1 substantially as herein described with 

reference to any example thereof and with or without 
reference to the accompanying drawings. 

35. An isolated nucleic acid molecule as 
claimed in claim S substantially as herein described with 
reference to any example thereof and with or without 
reference to the accompanying drawings. 

36. A coniferin p-glucosidase enzyme as claimed 
in claim 6 substantially as herein described with 
reference to any example thereof and with or without 
reference to the accompanying drawings. 

37. A recombinant vector as claimed in claim 7 
or claim 16 substantially as herein described with 
reference to any example thereof and with or without 
reference to the accompanying drawings. 

38. A transgenic plant as defined in claim 8 or 
claim 17 substantially as herein described with reference 
to any example thereof and with or without reference to 
the accompanying drawings. 
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39. An isolated oligonucleotide as defined In 
claim 13 substantially as herein described with reference 
to any example thereof and with or without reference to 
the accompanying drawings. 

40. A method as claimed in any one of claims 
22, 25 and 28 of producing a plant with an altered lignin 
content relative to an untransformed plant of that 
species sxibstantially as herein described with reference 
to any example thereof and with or without reference to 
the accompanying drawings. 

41. A transgenic plant as claimed in any one of 
claims 23, 26, 21, 29 and 30 substantially as herein 
described with reference to any example t^ereof and with 
or without reference to the accompanying drawings. 

42. An isolated nucleic acid as defined in 
claim 31 substantially as herein described with reference 
to any example thereof and with or without reference to 
the accompanying drawings. 

43. A method as claimed in claim 33 of 
isolating a nucleotide sequence encoding a coniferin 3- 
glucosidase enzyme substantially as herein dftcscribed with 
reference to any example thereof and with or without 
reference to the accompanying drawings. 



END OF CLAIMS 
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GGATTTGGACCTGAAAATATCAATTTCAAAGCAATTCCAGAGGGATAACGTGGGATCC^ 

AACCCACCATTCCCCCCTGCCGACCTCAGGCATATTTTGATTCTATTTAA^ 

GTGATTCTGTATAATTCGATCGCTCCGTrrTACCAGACAT(^GGTCTCTCTGTTCATGTTC^ 

MEVSVLMWVLU 
TCTATTCCTTATTAGGTTITCAACTGACGACAGCrAGGCTGGACAGGAA.C^ 

FYSLLGFQVTTARLDRNNFPSOrM 
TCGGCACAGCCTCTTCAGCGTATCAGTATGAAGGAGCACTCCGAGAACATG^^ 
FGTASSAYQYEGAVR EDCKGPSTW 
AaSCCITAAaVCATATGCCTGGTACAATAAAAGATACCAGCAATGGACAC^^ 

DALTHMPGR t KDSSMGDVAVOQYH 
GATATATGGAAGATATCGAGCTTATGGCTTCACTTGGACTAGATGCCT^^ 

RYMEDIELHASLGLDAYRFSISKS 
GAATCCTTCCACAAGGAAGACGTGAAATTAACATGCCTXSOGATTO^ 

RI LPEGRGE INMAGIEYYMNLIDA 
TTCTGCAAAATGGGATCCACCCCTTCGTGACATTGTTCCATTTC^ 

LLQNG IQPFVTLFHFDLPKALEDS 
ATGGGGGATGGCTGACTCCTCAAATAATTAACGACTTCGAAGCCrATCCAGA^ 

YGGWLSPQl INDFEAYAEICFRAF 
CTGACCGTGTCAAATATTGGGCGACAGTGAACGAGCCAAATCTGTTTC 

GDRVKYWATVNEPMLFVPLGYTVO 
TATTTCCACCGAOIAGGTGTGCTGCCCCTCACGCCAATCCTT^^ 

I rPPTRCAAPHANPLCMTGNCSSA 

AGCCATATCTAGCTGCACATCACGl-1 ' i I I J CTCGCCCACGCATCTGCAGTGGAGAAATATAGGGAG AAATATC 

EPYLAAHKVLLAHASAVEKYREKY 

AGAAAATTCAACGAGGATCTATAGGCTTACTTATAACaSCGCCATGGTACGAACCCT^^ 

QKIOGGSICLVISAPWyEPLE^JSF 

AAGAGAGATCAGCTCTTXXATAfrAATTTTATCCTTCAATCTC^^ 

EERSAVDKX LSFNI. RHFLDP I VFG 
ATTATCCACAAGAAATGCGTGAAAGATrAGGATCGCCCrrACCCTCCATATCCTCCGA^ 
DYPQEMRERLGSRL?SrSSELSAK 
TTCGGGGATCGTT<XACTATATGGCTArrAATCACTATACAACCTTATATG<^ 

LRGS FOYMG I NHYTTLYATST PPL 
CCCCCGACCACACGCAATATCTATATCCAGACTCTAGCCTTTATCTGACT^ 

SPDHTQYLYPDSRVYLTGERHGVS 
TCGGAGAAa^CAGGGATGCACGGTTTGTTTCTGGTACCTCA^ 

ICERTGMDCLFVVPHGIOKIVEYV 
AAGAATTCTATGACAACCCGACTATTATTATCCCAGACAACXOTTATCCACACTC^ 
KEFYDNPTtI lAENCYPESEESSS 
CTCTCCAAGAAA.\TCTAAACGATCTGAGGAGAATAAGCTTTCATGGAGArrc 

TLQENLNOVRRIRFHCDCLSY LSA 
C\ATCAAAAATGGCTCAGATCTT<XAGGCTACTTTGTGT^ 

AI KNGSDVRGYFVWSLLONFEWAF 
GCTATACCATTAGATTTtUrrcmATCACCTGGATTTCA! '' 1 C 1 X; ATCAAAAGAGATATCCCAACCTCTC0G 
GYTIRFCLYHVOFISOOKR YPKLS 
CTCAATOGTTCAGACAA'I ULl lCAGCACCArGATCAGGGAAGTATTA''.AAGCACCACCAGCLATTTAGACTC 
AOWFROFLOHODOGSIRSSSS I - 
CGTTCTCTATTTCCrrAATCAAACCGCACACATTCCrGCAACTCTACCCA^ 

TCTTCGGATCTATCCACCCTGAGACACATTACAAAGAAATavTCAATCTATTCC^^ 

TTCAGATCTTCTAGGCAACTCCAGCCTAAGAACATTG 



Figure 1 
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ZBGAA ^fEVSVLMWVLL^YSLLGF-Q VTTARLDRN M 

L41869 MRSSPVL—LLVIALVAA-AHLAPLECDGPNPNPEIGNTGGLSRQ G 

J26025 T— KLGSLLLCALLLAGF-A-LTMSKAAKTDPPI—HCA-SLNRS S 

J39228 LLLLGF-A-IANTNAARTDPPV— VCA-TLNRT K 

<56733 LLSI-T-TTHIHAFKPLPISFDDFS-DLNRS C 

P26204 M— DFIVAIFALFVISSF-T-ITSTNA—VEASTLLDIG-NLSRS S 

<g1986 MASKHSLHLFGLLIVFLV-S-LLLVLTNQATAFDGDFIPLNFSRS Y 

335175 M LVLFIS-L-LALTRPAMGTDDDDDNIPDDFSRK Y 

J33817 MAJ.LLASAIKHTAHPAGLRSH PNKESFSRHHLCSSPQKISKRRSWLS 

\4 8860P MAPLLAAAMNKAAAHPGLRSHLVGPNNESFSRHHLPSSSPQSSKRRCNLS 

<78433 MA-LLCSALSNSTH-PSFRSH-IGANS£NLW— HLSADPAQKSKRRCNLT 

<89413 KVLQKLPLIGLLLLLT— — IVASPANAD-GPVCPPSNKLSBAS 

352771P — KFPLLGLLLLVT LVGSPTRAEEGPVCPKTETLSRAS 

300326 MKLLHGLALVFLLAAASCK ADEEITCEENNP 

)(79195 MKLL-GFALAILtW ATCK PEEEITCEEMV? 

LI 1 4 54 MKLL-MLAFVFLIALATCK GDEFVCEENEP 

LPH3HU WEKF f SQPKFERDLF 

\48969P M 

308638 M 

S45675 WP 



29 
43 
39 
29 
29 
39 
44 
33 
47 
50 
45 
39 
37 
31 
30 
29 
15 

— 1 

— 1 

— 3 



CBGAA 


fpS 


DFMFGTASSAYQYEGAVRE 


51 


L41869 


FPA 


G FVFGTAA5 AYQ VEGMARQ 


65 


J26025 


FDA 


LEPGFI FGTAS/AYQFEGAAKE 


64 


J39228 


FDT 


LFPGFTFGTATASVQLEGAANI 


54 


X56733 


f;^ 


PGFVFGTASSAFQYEGAAFE 


51 


P26204 






61 


X9A9e6 





DFI FGTATSAYQI EGAAMK 


66 


335175 


FPD 


Opi FGTATSAYQI EGEATA 


55 


•J33817 


FRPR^QTISSESAG^HRLSPWEXPRRDWFPPSFLFGAATSAyQI£GAWKE 


97 


A4e860P 


FTTRSARVGSQM-GVQMLSPSEIPQRDWFPSOFTFGAATSAYQIEGAWNE 


99 


X78433 


LSSRAARISSALESAKQVKPWQVPKRDWFPPEFMFGAASAAYQIEGAWNE 


95 


X894U 




- FPEGFLFGTATAAYOVEGAINE 


61 


S52771P 






59 


300326 


FTCSNTDILSSKN- 


FGKDFI FGVASSAYQI EGGR- - 


64 


X79195 


FTCSQTDRFNKQD 


FES DFI FGVASSAYQI EGGR- - 


63 


L11454 


FTCNQTKLFNSGN 


FEKGFI FGVAS S A YQ VEGG R - - 


62 


LPH3KU 


YKGT-- 


pj^DD FLWG VS S S AYQ I EGAW DA 


41 


A48969P 




-FPSD FKWGVAT AAYQ I EG AYN E 


27 


306638 


fjVKK 


- FPEGFtWGVATAS YQI ECSPLA 


27 


S45675 


- - - AAOOTATAP DAALT 


FPEGFLWGSATASYQIEGAAAE 


39 



Figure 3 
1-7 



C C ^ t 



3GAA DGKGPSTWDALTHM-PGRI-KDSSNGDVAVDCYHRYMEDIELMASLGLDA 99 

as 69 GGRGPCIWDAE-VAI-QGMI-AGNGTADVTVDEYHRYKEDVGIMKNMGFDA 113 

26025 DGRGPSIWDTYTHNHSERI-KDGSNGDVAVDQVHRYKEDVRIMKKMGFDA 113 

39228 DGRGPSIWDArrHNHPEKI-TDGSNGDVAIDQYhRYKEDVAIMKDMGLDA 103 

36733 DGKGPSIWDTFTHKYPEnl-KDRTNGDVAIDEYHRYKEDIGIMKDMNLDA 100 

26204 GGRGPSIWDTFTHKYPEKI-RDGSNADITVDQYHRYKEDVGIMKDONMDS 110 

94 986 cGRGASWDTFTHQYPERI-LDHSTGDVA0G^YYRFKGDIQ^IVKNMGFNA 115 

35175 KGRAPSVWDIFSKETPDRI-LDGSNGDVAVDFY-NRYIQDIKNVKKMGrNA 104 

3 3817 DGKGPSTWDHFCHNFPEWI -VDRSNGDVAADS YHMYAEDVRLLKEMGMDA 14 6 
t8860P DGKGESNWDHFCHNHPERI-LDGSNSDIGANSYHMYKTDVRLLKEMGMDA 148 
784 33 GGKGPSSWDNFCHSHPDRI-MDKSNADVAAKSYYMYKEDVRMLKEIGMDS 144 
39413 TCRGPALWDIYCRRYPERC-NND-NGDVAVDFFHRYKEDIQLMKNLNTDA 109 
5277 IP GCRGPSLWOIYTKKFPHRV-KNH-MADVAVDFYHRFREDIKLMKKLNTDA 107 
30326 -GRGV>AA*DG?SHRYPEKAGSDLKNGDTTCESYTRWQKDVDVMGELNATG 113 
79195 -GRGUfVWDGFTHRYPEKGGADLGNGDTTCDSYRTWQKDI.DVMEELGVKG 112 
114 54 -GRGLNVWDSFTHRFPEKGGADLGNGDTTCDSYTLWQKDIDVMDELNSTG 111 
?H3HU DGKGPSIWDNFTHT-PGSKVKDNATGDIACDSYHQLDADLNMLKALKVKA 90 

4 8 9 69 P DGRGMS I WDT FAHT- PGKV- KKGDNGNVACDS YHRVEEDVQLLKDLGVKV 7 5 
38638 DGAGMSIWKTFSHT-PGhAT-KNGDTGDVACDHYNRWKEDIEIIEKLGVKA 75 
4 5675 DGRTPSIMDTYART-PGRV-RNGDTGDVATDHYHRWREDVALMAELGLGA 87 
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ZBGAA LPKALEDSYGGWLS PQIINDeEAYXEICFRAFGDRVKYWATVNEPNL 194 

'^^1^69 LPIALHOQYLGWLS PKIVGAFADYXEFCFKVFGORVXWrrrNEPRV 208 

J26O25 LPQALEDEYGGFLS PNIVDHFRDYANLCFKKFGDRVKHWITLKSPYT 210 

J39228 / VPQALEEEYGGVLS PRIVYDFKAYAELCYKEFGURVKHWTTLNEPYT 200 

<56733 VPQALEDEYRGFLG— "Pi/IVDDFRDYAELCFKEFGDR'/KHWITLNEPWG 197 

P26204 LPQVLEDEYGGFLN SGVINDFRDYTDLCFKEFGDRVRYWSTLNEPWV 207 

<94986 TPOAIEDKYGGFLS ANIVKDYREYADLLFERFGDRVKFWMTFMEPW5 212 
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^48860P VPQALEEKYGGFLDKSHKSIVEDYTYFAKVCFDNFGDiC/KMWLTnJEPQT 248 
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<894 13 TPQDLEDSYGGFLSE- — RIVKDFnEYADrv/FQEYGGKVKHWITFNEPWV 206 
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300326 LPQTLQDEYEGFL DRQI IQDFKDYAJDLCFKEFGGKVKHWITINQLYT 210 

.<79195 LPQSLQDEYEGFL DRTIIDDFKDYADLCFERFGORVKHWITINOLFT 209 

L11454 LPQTLODEYNGFL"-KKTIVDDFKDYADLCFELFGDRVKNWrTINQLYT 208 

LPH3HU LPOALQDI-GGW---ENP.\LICLFDSYADFCFQTFGDRVKFWrrrFKEPMY 185 

\48969P LPQALQDQ-GGW- — GSRITIDAFAEYAELMFKELGGKIKQWITFVEPWC 169 

208638 LPFALQLK-GGW ANREIADWFAEYSRVLFENFGDRVKNWITLNEPWV 169 

54 5675 LPQELENP-GGW PERPTAERFAEYAAIAADALGORVKTWTTLKEPWC 181 



2BGAA P/PLGYTVGIFPPT-RCAAPKANPLCM-TGNCSSAEPYIAAHHVLIAKAS 242 

L41869 VAAXGYDNGFHAPG-RCSK CP-AGGDSRTEPYIVTHNII LSHA-^ 250 

J26025 FSSSGYAYGVHAPG-RCSA-WQKLMCT-GGN-SATEPYLVTHHQLLAKAA 256 

J3322 8 ISNHGYTIGIHAPG-RCS3-WYDPTCL-GCD-SGTEPYLVTHNLLLAHAA 24 6 

.<5673 3 VSW/AYAYGTFAPG-RCSD-WLKLNC7-GGD-SGREPYLAAHYQLLAHAA 24 3 

P26204 FSNSGYALGTNAPG-RCSA-SNVAK PGD-SGTGPYIVTHNQILAHAE 251 

-<94 986 LSGFAYDDGVFAPG-RCSS-WV>^RQCR-AGD-SATEPYIVA}^HLI.LAHAA 258 

53 5175 YVGFAHDDGVFAPG-RCSS-WVNRCCL-AGD-$ATEPYIVAHNLLLSHA.\ 24 7 

J338L7 FCSVSYGTGVLAPG-RC5?---GVSCAV?TGNSLSEPYIVAHKLLRAHAE 289 

\4 8 860P FTSFSYGTGVFAPG-RCS?---GLDCAYPTGNSLVEP'r:AGHNILLAHAE 294 

<1BAZ3 FCGtGYGTGLHAPGARCSA---G>n'CVr?EEDALRNPYIVGHNLLLAHAE 288 

<394 13 FLHAGYDVGKKA?G-RCSSYVNA---KCQDGRSGYEAYLVTHNLLrSHAE 252 

35277 IP FSRSAYDVGKKAPG-RCSPYIKDFCHLCQPGrSGFEAYWSHNLLVSHAE 253 

200326 VPTRGYAIGTDAPG-RCSP-^<VDTKHRCYGG^;SSTEPYIVAHNCLLAHAT 258 

:<79195 VPTRGYALGTDAPG-RCSO-WVDK -RCYGGD3STEPYIVAHNCLLAHAT 255 

L114 54 VPTRGYALGTDAPG-RCSP-KIDV--RC?GG^i3STEPYrVAHNQLLAHAA 254 

LPH3KU LAWLGYGSGEFPPGVK- DPGWAPYRI AHTVI KAKAR 220 

\4 8 969P KAFLSMYLG'/HAPGWK DLQLAIOVSHHLLVAHGR 203 

208 638 VAIVGHLYG's/HAPGMR DI YVAFRAVHNLLRAH.-P 203 

3 4 5675 SAFLGYGSGVHAPGRT DPVAALRAAHHLNLGKC^ 215 
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U39228 AVKLYREKYQASQ£GVIGITWSHWFEPASESQ-KDINASVRALDFMYGW 295 
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X56733 FMHPLTK-GRYPESMR YLVRKRLPKFSTEESKELTGSFDF 331 

P2 6204 FMEOLTT-GDYSKSMR RIVKNRLPKFSKFESSLVNGSFDF 34 0 

X94 986 WMDPITY-GRYPRTVQ YLVGNRLLNFTEEVSHLLRGSYDF 34 6 

53 5 175 WMDPtfTY-GRYPRTMV DLAGDKLIGFTDEESOLLRGSYDF 335 

U33817 FLEPW R-GDYPFSMR VSARDRVPYFKEKEQEKLVGSYDM 376 
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S5277 1 ? KLDPTTY-GDYPQSMK DAVGARLPKFTKAQKAKLKGSADF 339 
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L114 5-1 FMGPLTE-GKYPDIMR EYVGDRLPEFS "EAALVKGSYDF 34 2 

LPH3HU FAHPIFWiGDYPDTMKWKVGNRSELQHLATSRLPSFTEEEKRFIRATADV 320 

M8969P YLDPIYF-GEYPKFM LrWYENLGY KPPr/DGOMELIHQPI DF 290 

308 638 FLNPIYR-GOYPELV LE-FAR-EY LPEMYKDDMSEIQEKI DF 2 89 

54 5675 rTGPMLQ-GAYPEDL VKDTAGLTD WS r/ROGOLRLAHQKLDF 303 
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CBGAA MGrNHYTTLYAT5TPPLSPDHTQ--YLYPDSRVYLTGERHGV-SIGERTG 377 

L41869 VGINQYTSYYMKOPGAWNQTPVS--YQ-OOWHVGFVYERNGV-PIGPBAN 3fi4 

U26025 ' IGLNYYTTRYASNAPKITSVHA--SYITDPQVNAT-AELKGV-PIGPMAA 290 

U39228 IGVNYYSARYASAYPEDYSIPTPPSYLTDAYVNVT-TELNGV-PIGPQAA 382 

X56733 LGLMYYSSYYAAKAPRI — PNARPAIQTOSLINAT-FEHNGK-PLGPMAA 377 

P26204 IGINYYSSSYISNAPSH— GNAKPSYSTNPKTNIS-FEKHGI-PLGPRAA 386 

X94986 IGLQYYTSYYAKPNAPYDtNHIR— YLTDNRVTETPYDYNGN-LIGPQAY 393 

535175 VGLQYYTAYYAEPIPPVDPKFRR— YKTDSGVNATPYDLNGM-LIGPCAY 382 

:U33817 IGINYYTSTFSKHID-LSPNNSPVXNTDDAYASQETKGPDGM-AIGFPTG 424 

A48e60P LGLNYYTSRFSKNID-ISPNYSPVai4TDDAYASQEVNGPDGK-PIG?PMG 429 

iX78433 VGINYYTSRFAKHID-ISPEFIPKINTDDVYSNPEVNDSNGI-PIGPDVG 423 

X89413 VQLNYYTSVFSNHLEK--PDPSKPRWMQDSLITWES»aJAQ-NYAIGSKPL 387 

S52771P VGINYYSSFYAKASEK--PDYRQPSWATDSLVEFEPK7VDGSVKIGSQPS 387 

Q00326 LGiJ^Y'/VTOVAQPKPNPYPSETHTA-MMDAGVKLTYOKSRGEFL-GPLFV 394 

X79195 LGLNYYVTQYAHALDPSPPEKL-TA-MTOSLANLTSLDANGQPP-GPPF- 388 

LI 14 54 LGLNYYVTQYAQNNQTrVPSDVHTA-LMDSRTTLTSKIIATGHAP-GPPF- 389 

LPH3HU FCLNTY YSRIVQHKTPRLNPPSYEDD— QEMAEEECPSWPSTAHNRA 365 

A489 69P IGINYYTSSMNRYNPGEAGGMLSSE/,ISMGAP KTD 325 

Q08 63 8 VGUi YYSGHLVKFDPDAPAKV S FVERDL P KTA 32 1 

S4567 5 LGVNYYSPTLVSEADGSGTHNSDGHGRSAHSPWPGADRVAFHQPPGETTA 353 



fCBGAA MDGLFWPH-— GIQK1VEYVKEFYDNPTII-IAENGYPESE--ESSST 420 

''L41869 SDWLYIVPW GMNKAVTYVKERYGNPTMI- ISENGMDOP GNVS 425 

a26025 SGWLYVYP KGIHDLVLYTKEKYNDPLI Y-ITENGVDEFN-- DPKLS 433 

'U39228 SDWLYVYP KGLYDLVLYTKNKYNDPIMY-ITENGMDEFK--NPKI S 425 

X567 33 SSMLCIYP OGIRKLLLYVKNHYMNPVI Y- ITENGRNSST-- INTV- 419 

P2 6204 SIWIYVY?'fMriQEDFEIFCYILKTNrTILQFSnXNGMNEFN--DATLP 434 

X94 98 6 SDWFYIFP ESIRHLT.NYTKDTYNDPVI Y-ITZNGVDNQN— NETEP 4 36 

S35175 SSWFYIFP KG! RHFLN'rTKDTYNOPVI Y-^/TENGVDNYN— rfESQP 4 25 

U338n NAWI>IMYP KGLHDILMTMKNKYGNPPMY-ITENGMGDIDKGDLPKP 469 

A4 8860P NPWIYMYP EGLKDLLMIMKNKYGNPPI Y- ITSNGIGDVDTKETPLP 474 

X7 8433 MYFIYSV? KGIKNI LLRMKEKYGNPPI Y- ITENGTADMDGWGMP- ? 4 67 

X89413 TAALMVYS RGFRSLLKYI KDKYAJIPEIM- IMENGYGEELGASDSV-A 432 

5 5277 IP 7AKMAVYA AGLRKLVKYI KDRYGNPEI X - ITENflYGEDLGEKDTDHS 4 33 

00032 6 EDKV>fGNSYYYPKGrYYVMDYFKTKYGO?LIY-VTXNG---FST?SSEMR 4 40 

X7 9195 SKG3YYHPRGML.^n/MEHFKTKYGD?L:Y-VTENG---FSTSGG?I? 4 30 

LI 1454 MAASYYYPKGIYYVMDYFKTTYGOPLIY-VTENG FSTPGDE-0 430 

LPH3HU ; PW GTRRLLNWI KEEYGDI PI Y- ITZNCyGLTNPNT 4 00 

A4 8 969P ICWE 1 YAECLYDLLRYTADXYGNPTLY- ITENGA CYMDGLS 365 

Q08 638 MGWE IVPEGI YWI LKKVKEEYNPPEVY- ITENGA AFDOWS 361 

54 567 5 MGWA VDPSGLYELLRRLSSOFPALPLV- ITENGA AFHDYAD 393 
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CBGAA LQENU^DVRRIRFHGDCLSYLSAAIKN-GSDVRGYFVWSLLDNFWAFGY 

L41869 iADGVhOTVRIRYYRDYITELKKAIDN-GARVAGYFAWSLLDNFr/?RLGY 

U2 602 5 MEEALKDTNRI DFYYRHLCYLQAAI KK-GSKVKGYFAWSFLDNFEWDAGY 

U39228 LEQALNDSNKIDYCYRHLCYLQEAIIE-GANVQGYFAWSLLDNFEWSEGY 

X56733 TSRIPF ' 

P26204 VEEALLNTYKIDYYYRHLYYIRSAIRA-GSNVKGFYAWSFLDCNBJFAGF 

X94 986 IQDAVKDGFRIEYHRKHMWNALGSLKBYHVNLKGYFAWSYLDKFCTNIGY 

535175 lEEALQDDFRISYYKKHKWNALGSLKNYGVKLKGYFAWSYLDNFEWNIGY 

U33817 V— ALEDHTRLDYIQRHLSVLKQSIDLGAD-VRGYFAWSLLDNFEWSSGY 

A48860P MEDALNDYKRLDYIQRHIATLKZSIDLGSN-VQGYFAWSLLDNFEWFAGF 

X7 8 4 33 ^f^DPLDOPLRIEYLOQHMTAIKEAIDLGRRTLRGHFr-^SLIDNFEWSLGY 

X89413 AV-GTADHNRKYYLQRHTaSMQEAVCIDKVNVTGYFVWSLLDNFEWQDGY 

S52771P SV-ALKDHNRKYYHQRKLLSLHQAICEDKVNVTSYFVWSLMDNFEWLDGY 

Q00326 -EQAIADYKRIDYLCSHLCFLRKVIKEKGVKV.'RGYFAWALGDNYEFCKGF 

X79195 FTEAFHDYNRIDYLCSHLCFLRKAIKEKRVlfVKGYFVWSLGDNYEFCNGY 

L114 54 FEKATADYKRIDYT.CiiHLCFLSKVIKEKNVNVKGYFAWSLGDNYEFCNGF 

LPH3HU EDTDRIFYHKTYINEALKAYRLDGIDLRGYVAWSLMDNFEWLNGY 

A48969P LDGRIHDQRRIDYLAMHLIQASRAIED-GINLKGYMEWSLMDNFEWAEGY 

Q08638 EDGRV-HDCNRIDYLKAHIGOAWKAIQE-GVPLKGYFVWSLLDNFEWAEGY 

54 5675 PEGKVNDPEniAYVRDHLAAVHRArKD-G5DVRGYFLWSLLDNFEWAHGY 
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TIRFGLYHVDFIS-DCKRYPKLSACWFRQFLQHDDQGS 506 

TARFGIVYVDF-N-TLKRYPKDSALWFKNMLSEKKRS 509 

TVRFGINYVDYND-NLKRHSKLSTYWFTSFLKKYERSTKEIQMFVESKLE 531 

TVRFGINYVDYDN-GLKRHSKLSTHWFKNFLKRSSISKEKIRRCGNNNAR 523 

425 

TVRFGLNFVD 4 93 

TARFGLYYVDYNN-NLTRIPKDSAYWFKAFLN-PENITKTTRTVSWDSRK 534 

TSRFGLYYVDYKN-NLTRYPKKSAHWFTKFLNISVNANHIYELTSKDSRK 524 

TERFGIVYVDREN-GCERTMKRSARWLQEF NGAAKKVE 553 

TERYGIVYVDRNN-NCTRYMKESAKWLKQF NAAKKP-- 558 

LSRFGIVYIDRND-GCKRIMKKSAKWLKEF --NGATKKLN 554 

KNRFGLYYVDFKN-NLTRYEKESGKYYKDFLSQGVRPSALKKDE 52 3 

TARFGLYYIDFQN-NLTR-MEKESATCS LNSSNRA 514 

TVRFGLSYV>fWEDL-DDP>fLKESGKWYQRFIN GT^/K>1AVKQDFL 532 

TVRFGLSYVOniNVTADRDLKASGLWYQSFLR DTTKNQDIL 521 

TVRFGLSYVDFANITGDRDLKASGKWFQKFIN VTDEDSTNODLL 52 4 

TVKrGLYHVOFNNTHRPRTAHASARYYTEVITNNG-- 4 80 

GMRFGLVHVDYDTL--VRTPKDSFYWYKGVISRGIVL-D 449 

SKRFGIV"r^YSTQ--KRIVKDSGYWYSNWKNNGL-E • 445 
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